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This  report  is  organized  as  five  separate  reports 
the  five  tasks  of  the  contract.  Four  of  these  reports  are  in¬ 
cluded  in  the  present  document  plus  a  separate  report  written  by 
Prof.  C.  W.  Merriam,  III. 

The  first  section  of  this  report  provides  the  asymptotic 
theory  and  geometric  procedure  for  determining  the  amplitude 
of  the  crosscorrelation  function  for  FM  signals  whose  instan¬ 
taneous  frequency  curves  overlap  at  one  point  in  time.  In  addi¬ 
tion,  the  geometric  procedure  can  be  used  to  upperbound  the  cross- 

Y5e 

correlation  function  for  multiple  intecting  cases,  where  absolute 

A 

phase  of  the  signal  components  is  not  generally  known. 

/T 

The  second  section  provides  a  new  definition  of  wideband 
energy  density  function  which  allows  for  Doppler  stretch.  The 
key  to  this  definition  and  its  properties  lies  in  the  use  of  the 
Four ier-Me lion  transform  which  converts  time  stretching  to  a 
multiplicative  exponential  factor  in  the  transform  domain.  The 
relationship  between  the  wideband  energy  density  function  and  the 
wideband  ambiguity  function  is  also  established  using  a  modified 
two-dimensional  Fourier  transform.  Using  these  new  results, 
we  feel  we  can  now  extend  the  theory  of  wideband  signal  processing 

The  third  section  discusses  autoregressive  techniques  in¬ 
cluding  the  maximum  entropy  predictive  technique.  Although  no  new 
analytic  results  were  found,  the  extensive  simulations  show  that 
these  techniques  have  certain  difficulties,  particularly  in 
resolving  closely  spaced  echos  in  a  multipath  environment.  Thus, 


except  for  particular  circumstances,  these  techniques  do  not 
do  as  well  as  a  DFT.  Caution  should  be  used  in  applying  maximum 
entropy  estimation  to  periodicity  estimation.  Included  in  this 
section  is  a  complete  review  of  the  multiple  v.ays  of  viewing  the 
overall  problem  of  autoregressive  spectral  analysis. 

The  fourth  section  has  several  new  results  concerning  adap¬ 
tive  estimation  procedures.  An  exponentially  weighted  least 
squares  algorithm  (EWLS)  is  developed.  The  Widrow-Hoff  algorithm 
(LMS)  is  an  approximation  to  the  EWLS  algorithm.  Several  impor¬ 
tant  results  are  presented.  In  particular,  the  importance  of 
choosing  an  input  signal  sequence  whose  wideband  ambiguity  function 
is  insensitive  to  Doppler  scaling  is  established.  Other  results 
include  "misad j ustment  noise",  stability,  error  in  tracking  time- 
varying  parameters,  effects  of  additive  noise,  and  the  role  of 
the  input  signal. 

The  fifth  section,  included  as  a  separate  report,  covers 
other  gradient  algorithms  in  parameter  estaimtion.  Included  here 
is  a  discussion  of  the  effects  of  coefficient  averaging  and  the 
reduction  of  the  misadjustment  effects. 

The  only  area  in  which  we  feel  that  definitive  results  did 
not  occur  is  in  determining  the  effects  of  Doppler  stretch  on  the 
Wiener  solution.  The  difficulty  here  is  that  although  the  original 
signal  and  the  Doppler  scaled  signal  are  stationary,  they  are  not 
jointly  stationary.  Thus,  all  efforts  at  formulating  the  problem 
analytically  failed,  despite  the  fact  the  much  initial  effort 
was  devoted  to  this  problem. 
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to  the  determination  of  the  wideband  energy  density  function. 
Finally,  we  wish  to  thank  Darrell  Marsh  for  his  patience  in  allow¬ 
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I .  Introduction 

For  many  Navy  applications.  Frequency  Modulated  (FM)  signals 
which  have  large  time-bandwidth  product  are  being  considered  for 
underwater  acoustic  communications.  With  the  possibility  of  co¬ 
herent  processing  and  a  large  number  of  signals,  it  is  clearly 
desirable  to  find  ways  of  evaluating  the  mutual  crosscorrelation 
functions  of  various  signals.  Using  the  method  of  stationary  phase 
we  will  derive  an  asymptotic  formula  for  the  crosscorrelation  of 
two  FM  signals  whose  instantaneous  frequency  curves  cross  at  only 
one  point  in  time.  For  this  case,  a  simple  geometric  procedure 
for  estimating  the  crosscorrelation  is  given  based  upon  the  over¬ 
lapping  area  of  two  templates  generated  in  time-frequency  space. 

For  signals  with  more  than  one  instantaneous  frequency  cros¬ 
sing,  the  geometric  procedure  provides  an  upperbound  on  the  cross¬ 
correlation,  since,  in  general  the  absolute  phase  between  the  two 
signals  may  not  be  known. 

Since  the  cross  ambiguity  function  is  a  crosscorrelations 
function  we  can  apply  this  procedure  directly  to  this  case. 


II .  Asymptotic  Theory. 

In  this  section  we  consider  the  problem  of  asymptotically 
estimating  the  crosscorrelation  function  between  two  FM  signals, 
based  upon  crossirgs  of  their  instantaneous  frequency  curves.  We 
shall  deal  with  complex  (analytic)  signals.  In  Appendix  A  we  es¬ 
tablish  the  relationship  between  the  crosscorrelation  of  the  real 
parts  of  the  analytic  signals  and  the  results  for  analytic  signals 
themselves . 

We  begin  by  defining  instantaneous  frequency.  Suppose  we  have 
an  analytic  signal  of  the  form 

f  (t)  =  avt)ej0(t)  (1) 

By  analytic  we  mean  that  the  Fourier  transform  of  f(t),  F(w), 
satisfies  the  condition 


F  (cj)  =  0,  oj  <  0 


(2) 


Thus  the  signal  has  a  one-sided  spectrum.  The  instantaneous 
frequency  of  f(t)  is  the  time-rate  of  change  of  its  phase,  thus 


wf  (t) 


=  a  9  (t) 

di_ 


v  3 ) 


Thus,  for  example  if 


for  a  typical  chirp  FM  signal,  then 


a)f  (t)  =  Bt  +  w 


We  assume  that  the  two  signals  of  interest  are  of  the 


same  form: 


f(t)  =  a(t)ejAa(t) 
g  (t)  =  b(t)e“jB6(t) 


(6-a) 


(6-b) 


and  the  cross  correlation  is  given  by 


=  /  f(t)  g*  (t+T)dt 


Without  loss  of  generality  we  assume  the  analytic  signals 
have  unit  energy.  Thus 


/  I f (t) | 2  dt 


-  /  |g(t 


)  dt  =  1 


Thus,  by  the  Schwarz  Inequality,  the  maximum  value  of  their  auto 
and  crosscorrelation  functions  is  unity.  Their  instantaneous 
frequencies  are  thus 


(1.4) 


The  two  constants,  A  and  B,  are  assumed  to  be  large. 
Substituting  equations  (6)  into  (7)  yields 


00 

Rf  It)  =  /  k(t,T)e«h(t,T)dt 

^  —CO 

where 

I 

i  k(t,T)=a(t)b  (t+x) 

Xh(t,t)  =  Aa(t)  +  B0(t+x) 


(10) 


(11-a) 

(11-b) 


If  we  assume  that  A>B,  without  loss  of  generality,  then  we 
have 


X  =  A 

h  (t  ,x  )  =  a  (t)  +  0  (t) 


( 12-a) 
( 12-b) 


Applying  the  method  of  stationary  phase  to  the  integral  (4) 
we  have,  as  an  asymptotic  expression 

1/2 

k  ( t , T )  exp  |  j x h  (t,x)  +  j -rr/4 ]  (13) 


Rfg(x)  - 


2tt 


X  d  h(t,x) 

2 


dt‘ 


where  the  stationary  point  t  is  the  unique  solution  of  the 
equation 

wf(t)  -  wg(t)  =  0  (14) 

It  is  assumed  that  g  is  continuous  and  h  is  twice  continuously 


differentiable  with 


d  h (t , t  ) 


>  0 


dt‘ 


Let  us  do  an  example  of  how  this  procedure  works.  We  will 
first  look  at  the  cross  correlation  between  two  linear  FM  signals, 
one  an  up  chirp  and  the  other  a  down  chirp.  We  let 


f  (t)  =  a  (t)  e 


j-t2 

J2 


g  (t)  =  b  (t)  e 


-j|(T-t)2,  0  <_  t  <_  T 


The  instantaneous  frequency  curves  for  the  two  signals  are 
shown  in  Figure  1. 

The  unit  energy  condition  reveals  that 
T  T 

J  a2(t)dt  =  /  b2 ( t) dt  =  1 
0  0 

Solving  for  the  stationary  point,  with  x=A  we  have  that 

h(t,T)  =  d  +  lar-T) 2 

and  differentiating 


d  h  ( t ,  x ) 
dt 


t  +  (t+t-T) 


Setting  the  derivative  of  h  equal  to  zero  and  solving  for  t  we 


obtain 


The  second  derivative  condition  is  satisfied  since 


d  h(t,T) 

at2 


2  >  0 


A  A 

Solving  for  k(t,x)  and  h(t,x)  we  have 


k (t ,x )  =  a(^l)  b 
Xh (t ,x )  =  |  (x-T)2 


Thus  we  have  as  an  estimate  of  R 


fg 


Rfg(x)  ~  a(^-L)  b  (^—-)  [J]  exp[j|(x-T)2  +  j tt/4 3 


If  we  further  assume  that 


a  (t )  =  b  (t)  =  —  ,  0  <  t  <  T, 

/T 


thus  the  signals  have  rectangular  envelones.  Then  in  the  region  of 
overlap  (-T<x<T)  we  have 


(x) 


i  f— 1  1/2 
T  1 A J 


We  observe  that  the  total  frequency  deviation  is  AT 


rad/sec.  or  2irF  rad/sec. 


We  see  that,  as  the  TB  product  gets  large  the  crosscorrela¬ 
tion  function  diminishes  as  the  reciorocal  square-root  of  the  TB 
product. 

In  order  to  check,  the  asymptotic  results,  several  examples 
were  run  on  our  Interdata  7-32  minicomputer.  The  signals  are  of 
the  form 

F  (I)  =  a  (I) exp[BI2+WI] ,  I  =  1,  N. 

The  amplitudes  are  rectangular  except  for  a  10%  raised  cosine 
at  either  end  of  the  signals.  Figures  2.  and  3.  show  the  two 
signals  for  N=128.  The  real  and  imaginary  parts  are  plotted 
consecutively  on  each  graph.  The  parameters  are 


a)  Upsweep 

b)  Downsweep 


B  =  0.003,  W  =  0.3 
B  =  -0.003,  W  =  1.068 


Calculating  the  timewidth  and  Bandwidth 


Tf  =  (0.9) (128)  =  115.2 
Ff  =  (0.9)  (Q^-68-)  =  0.110 


Thus  the  crosscorrelation  estimate  is 


/2TfFf 


0.1986 


Figure  4.  shows  the  actual  crosscorrelation  function  and 


the  asymptotic  estimate,  shown  as  a  straight  line. 

Suppose  the  10%  raised  cosine  window  is  not  included  then 

Tf  =  128 
Ff  =  0.768 

and  the  estimate  is  |  Rf g- 1  ~  0.179.  This  case  is  seen  in  figure 
5.,  where  the  actual  crosscorrelation  and  estimate  are  plotted 
together. 

A  second  example  with  larger  TB  product  is  given  below. 

Here  the  parameters  are 

N=256  a)  Upsweep  B  =  .003,  w  =  0.300 

b)  Downsweep  3  =  -0.003,  w  =  1.836 

Figure  6.  shows  the  first  half  of  the  crosscorrelation  functions 
of  the  signals  with  and  without  the  10%  raised  cosine. 


jre  4  •  Crosscorrelation  function  and  Asymptotic 
Estimate  of  Linear  FM  Up  and  Downsweep 
with  1C  %  raised  cosine  window.  t  =  0 
corresponds  to  Time  Delay  =  127. 


CCF  LINEPR  FM,  WITH  RSYMPT .  ESTIMRTE 
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00 


80.00 


200 . 00 


240.00 


280 . 00 
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Ill .  Geometric  Interpretation 

In  this  section  we  show  that  a  simple  geometric  interpreta¬ 
tion  may  be  applied  to  the  asymptotic  results  for  the  cross¬ 
correlation  function.  The  geometric  interpretation  can  be 
very  useful  in  calculating  the  crosscorrelation  function.  In 
particular  we  shop-  that  asymptotic  result  is  obtained  directly 
from  the  area  of  a  parallelogram  in  the  time-frequency  plane. 

We  begin  by  defining  a  parallelogram  in  terms  of  two  sets  of 
parallel  lines  in  the  t-w  plane.  Figure  7.  shows  the  4  lines. 
The  area  of  the  parallelogram  is 

Area  =  xy  sin  9  (15) 

We  must  calculate  the  three  quantities  separately.  To  obtain 
sin  9  we  observe  that 

Tan  a  =  A,  Tan  3  =  B. 

Note  that  is  defined  with  a  negative  slope.  Thus  we  have 

Tan (90 °-a)  =  i,  Tan(90°-S)  =  | 

Taking  inverse  tangents  and  adding  we  have 

Tan-1(i)  +  Tan-1  (|)  =  180  -  a  -  8  =  0 

Thus 

I  +  I 

Tan  9  =  Tan  [Tan_1(^)  +  Tan_1(i)]  =  -A-  . 

A  B 


and  from  this  we  obtain 


sin  0 


A  +  B _ 

/(A+B)  2+(AB-l)  2 


sin  0 


A  +  B 

/Ta2+d  (b2+I) 


(16) 


To  obtain  x  and  y  we  recall  that  the  diameter  of  the  in¬ 
scribed  circle  for  a  triangle  is  calculated  as  the  ratio  of  any 
side  to  the  sin  of  the  opposite  angle.  Thus 


C  _  X'.  d  __  y 

sin  0  ~  sin  a  '  sin  0  sin  0 


or 


v  -  c  sin  a  _  D  sin  8 

sin  0  '  Y  "  “sin  0 


but 


sin  a  = 


A 

/-^TT 


sin  8  = 

/b2+T 


(17) 


(18) 


Thus  combining  (16),  (17),  and  (18)  and  substituting  into  (15) 
we  obtain 


Area 


CD  (AB) 
A+B 


(19) 


Finally  we  observe  that 


and  substituting  we  have 


Area 


C  'D ' 
A+B 


(20) 


Here  C'  and  D'  are  the  respective  vertical  lengths  between 
the  two  sets  of  parallel  lines  and  A  and  B  are  the  tangents  of 
the  lines  (one  positive  and  one  negative) . 

Consider  now  the  magnitude  -  squared  of  equation  (13) ,  we 
obtain 


(t) 


2ir 


X 


d  n (t ,t) 

at2 


k2 ( t , t ) 


(21) 


which  is 


Rfg  (T) 


Al 


2?. 


dcjf(t,T) 

dt 


dWg  (t,T) 

dt 


(t , t )  bz  (t,x)  (22) 


If  we  now  assume  that  the  two  instantaneous  frequency  functions 
are  slowly  varying,  i.e.  that  we  may  replace  and  by  the  first 

A 

two  terms  of  their  Taylor  series  expantion  about  the  point  t, 
for  fixed  x,  then  we  may  observe  that  if 


C'  =  a2 (t , t ) 
D'  =  b2(t,t) 


which  are  also  assumed  to  be  slowly  varying,  and 


(1.19) 


1  d(Uf(t,T) 

2tT  dt 

A 

1  dwg(t,T) 
2tT  dt 


Then  we  have  that 


|Rfg(t) |  -  [Area] 


and  the  crosscorrelation  function  is  asymptotically  the  square- 
root  of  the  intersecting  parallelogram  described  above.  The 
results  are  clearly  true  for  the  two  linearly  frequency  modulated 
signals  provided  in  the  example,  since  all  the  assumption  con¬ 
cerning  slow  variations  are  true.  Although  we,  as  of  yet, 
have  not  been  able  to  prove  the  result  when  we  remove  the  slow 
variation  assumptions  we  feel  the  result  is  still  approximately 
valid.  Thus,  one  could  construct  a  template  for  each  signal 
such  as  seen  in  Figure  8. 

Then  overlaying  the  two  signal  templates,  and  calculating 
the  square-root  of  the  overlapping  area,  we  obtain  an  estimate 
of  the  cross  correlation  function  for  that  value  of  time  dif¬ 
ference.  Moving  the  template  horizontally  would  yield  a  new 
value  of  crosscorrelation. 

Although  it  may  seem  confusing  as  to  why  we  plot  the  ampli- 

A  A 

tude-squared  along  the  frequency  axis,  i.e.  C'  =  a  (t,x) ,  observe 

that  for  the  rectangular  amplitude  case,  as  in  the  example, 

2  ~ 

a  (t,t)  «  1/T  which  has  units  of  frequency,  so  that  the  results 


are  consistent 


IV.  Multiple  Intersections 

The  case  of  multiple  intersections  and  signals  is  considerably 
more  complicated.  The  reason  for  this  becomes  clear  when  we  consider 
a  simplified  case.  Suppose  that  for  a  particular  delay,  the 
instantaneous  frequency  curves  have  two  intersections.  Then  we 
may  calculate  the  crosscorrelation  contribution  from  each  inter¬ 
section,  but  since  the  actual  correlation  is  the  complex  sum 
of  the  two  since  the  phases  must  be  considered.  In  principle 
we  have  the  asymptotic  expression  for  the  phases  in  equation  (13) 
and  could  thus  calculate  the  complex  sum  of  the  two  contributions. 

But  if  the  two  signals  are  only  specified  by  their  time-frequency 
plots  and  amplitudes  we  lose  control  of  absolute  phase.  To  show 
this,  suppose  that  g(t)  is  the  same  as  equation  (6-b)  and 
f (t)  is 

f(t)  =  a1(t)ejAloti{t)  +  a2  (t)ejA2a  (t) 
or 

f(t)  =  fx(t)  +  f2(t) 

This  could  represent  a  signal  with  a  fundamental  and  second 
harmonic  component,  where,  for  example,  the  second  phase  could 
be 


(24) 

(25) 


A2a2(t)  =  2£Aiai(t)] 

then 

i 


Rf„(T)  *  Rf  „(T)  +  Ri  „(T)  . 


(26) 


(1.22) 

Using  stationary  phase  once  again  for  each  of  the  two  integrals, 


the  two  stationary  points  t^  and  are  the  unique  solutions  to 


the  equations 


wf  ( t x)  _  wg  (t1)  =  0 


(27-a) 


and 


wf2(t2>  ‘  wg(t2)  *  0 


(27-bl 


with 


X1h1(t,t)  =  A1a1(t)  +  B3(t) 


(28-a) 


X2h2(t,t)  =  A2a2(t)  +  B3(t) 


(23-V 


and 


k  (t,  )  =  a1(t)b(t+r) 


(29-a) 


k2  (t ,  t)  =  a2  ( t)  b  (t+T ) 
we  have 


( 29-b) 


Rfg<T>- 


2  IT 


d  h1  (f  t) 


at 


1/2 


k1(t,t)exp[  jx1h1(t1,T)  +  jir/4]  + 


(30) 


1/2 


2tt 


d  h2(t2,T) 


2  dt" 


k2  (t2,t)  exp  [  jx2h2  (t2,t)  +  jir/4] 


Figure  9.  shows  a  typical  time-frequency  plot 


The  two  shaded  areas  are  the  square  of  the  amplitudes  of 
the  two  contributions.  Since  we  do  not  know,  in  general,  what 
is  the  absolute  phase  relationship  between  the  two  components 
we  can  only  say  that  crosscorrelation  lies  between  the  limits 

| (Area  1) 1/2  +  (Area  2)1/2|  (31) 

Clearly  an  upperbound  on  the  crosscorrelation  would  be  the  sum 
of  the  square  roots  of  the  areas.  Clearly,  in  more  general 
cases,  an  upperbound  to  the  crosscorrelation  would  be  the  sum 
of  the  square-roots  of  all  intersecting  areas.  This  would  be 
accurate  only  if  all  phases  were  the  same. 


V. 


(1.25) 

Crossabmiguity  Functions 

The  results  of  the  previous  sections  can  be  extended  to 
wideband  crossambiguity  functions.  In  order  to  accomplish  this 
consider 

00 

Af  (t,s)  =  /s  j  f  (t)g*[s(t+t) ]dt  (32) 

—  00 

which  is  the  wideband  crossambiguity  function  for  the  two  signals, 
f  (t^  and  g (t) .  The  parameter  s  is  the  Doppler  stretch  factor. 

If  we  assume  that  the  functions  are  defined  the  same  as  in 
equations  (6-a)  and  (6-b)  then  we  can  observe  directly  that 


k  (t,t) 

=  /i  A'/-)  b  [ s  ( t+x  )  ] 

(33) 

h  ( t ,  t  ) 

=  Act  (t)  +  B8  (st)  . 

(34) 

The  stationary  phase  point  occurs  at  value  of  t  which  is 
the  solution  of 

/  A 

wf(t)  "  s  wg(st)  =  0  (35) 

Thus  equations  (33)  and  (34)  are  substituted  directly  into 
equation  (13)  in  order  to  obtain  the  asymptotic  estimate  of 

Afg* 

If,  as  in  the  example  of  linear  FM, 
wf(t)  =  At,  w  (t)  =  A(T-t) 


then  the  stationary  point  occurs  at 


;  s  (t-x) 
i  +  s2 

It  should  be  observed,  for  this  example,  that  the  instantaneous 
frequency  line  for  /s  g[st]  is 

w~(t)  =  -A  s2t  +  SAT 

and  the  amplitude  of  g  is  modified  by  the  stretch  factor  as 
well . 

With  these  modifications  in  the  functions  h  and  k,  the 
geometric  procedure  is  directly  applicable  to  the  asymptotic 
estimate  of  crossambiguity  functions.  Clearly  all  comments  and 
results  for  the  multiple  intersection  case  are  equally  valid. 


VI.  Conclusions 


We  have  shown  that  a  simple  geometric  procedure  can  be  used 
co  obtain  the  asymptotic  estimate  of  crosscorrelation  functions 
for  signals  whose  instantaneous  frequency  curves  cross  at  one 
point  only.  In  other  cases  this  procedure  produces  an  upper- 
bound  on  the  crosscorrelation  function.  The  procedures  des¬ 
cribed  can  be  applied  directly  to  ambiguity  functions  since  it 
is  a  particular  form  of  crosscorrelation  function. 

The  particular  utility  of  the  geometric  procedure  lies  in 
casts  with  which  large  numbers  of  FM  signals  may  be  checked  for 
their  correlation  properties.  Further  as  new  electronic  devices 
which  can  represent  signals  in  t-w  space,  are  developed  the 
area  concept  may  proove  useful  as  an  identifier  or  matched 
filter. 


Appendix 


In  this  appendix  we  establish  that  if  we  have  two  analytic 
signals,  k(t)  and  h(t),  then  the  crosscorrelation  of  the  real  parts 
is  equal  to  the  real  part  of  their  crosscorrelation  function. 
Further  that  the  magnitude  of  the  complex  crosscorrelation  function 
is  the  envelope  of  the  real  crosscorrelation.  Finally,  we  give  a 
computer  example  demonstrating  this  fact. 

To  begin,  we  assume  that  f^(t)  and  f£(t)  are  real,  unit  energy 
signals,  and  the  transform 

CO 

f(t)  =  H[f](t)  -  |  J  dT  (A-l) 

—  CO 

is  the.  Hilbert  transform.  Then  we  form  the  analytic  signals 


k(t)  = 


—  [f,  (t) 
/2  X 


+  j 


r1(t) ] 


(A-2-a) 


h(t)  =  —  [f„(t)  +  j  f,(t)] 
/2 


(A-2-b) 


Defining  the  inner  product  at 


(k,h) 


/ 

—  00 


k ( t) h* (t) dt 


(A-3 ) 


and  using  the  properties  of  Hilbert  transforms  and  their 
spectra  is  can  be  shown  that  if 


F i  (to)  =  A^  (to)  e 


j©1  (<u) 


(A-4-a) 


F2(u)  = 


A-  (to)  e 


j©o  (to) 


then 


(A-4-b) 


(k,h)  -  (f ! / f  2 )  +  3  tj-  /  A1  (to) A2  (to)  sin[a  (to)  ]dto]  (A-5) 

J0 


where 


a(to)  =  9^(oo)  -  ©2  (oo) 
Recognizing  that  if 


f^t) 


=  f(t) 


f2(t) 


=  g(t+x) 


(A-6-a) 


(A-6-b) 


Then  the  inner  product  is  a  crosscorrelation  function  and 


Re{IWT)> 


(t) 


(A-7) 


Further  since  has  a  one  sided  spectrum  then  its  real  and 
imaginary  parts  are  themselves  Hilbert  transforms  and  hence 
lRkh(T)l  is  the  envelope  of  R^ft). 

To  demonstrate  this  figure  10.  shows  the  magnitude  of  the  compl 
crosscorrelation  and  the  crosucorrelation  of  the  real  parts  of  the 
up  and  do m  linearly  sweeped  signals  shown  in  figures  2.  and  3., 


figure  11. 


Complex  autocorrelation  and  real  autocorrela¬ 
tion  of  128  Linear  FM  Upsweep,  with  10%  raised 
cosine  window. 


ACF  AND  ENVEL, LINEAR  UP  SWEEP 


40.00  80.00  120.00  160.00  200.00  240.00  280  0C 

TIME  DELAY 


plotted  on  the  same  graph.  Figure  11.,  shows  the  same  results  for 
the  autocorrelation  functions  of  the  up  sweep  signal.  The  fact 
that  real  signal  does  not  touch  the  envelope  at  several  cycles  is 
an  artifact  of  the  plotting  routine  drawing  straight  lines  through 
two  sample  points,  neither  of  which  are  at  the  peak  value  of  the 
cycle. 
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1 •  Wideband  Energy  Density  Functions 

In  this  section  we  outline  our  attempts  at  trying  to  find 
a  definition  of  Energy  density  function  for  wideband  signals, 
for  which  the  Doppler  effect  is  not  a  simple  translation  of  the 
spectrum  but  must  be  considered  a  stretching  of  the  signal.  The 
procedure  is  to  look  for  a  transformation,  T  characterized  by 

A 

a  kernel  k(t,p)  so  that  if  f(t)  is  some  other  transformation  of 
a  signal  f  (t) ,  and 


F(p) 


00 

-/ 


f (t)k(t,p)dt 


(1) 


Then  the  function 

Af (t,P)  =  f (t)  F* (P) k (t , P) 


(2) 


would  have  seme  or  all  of  the  properties  of  the  narrow  band 
Energy  density  function^ ' ^ ^  .  Most  of  t-he  transformations  are 
based  upon  combinations  of  Fourier  and  Mellon  Transforms.  The 
reason  for  this  is  that  the  Fourier  transform  converts  a  time 
delay  to  an  exponential  multiplier  in  the  transform  domain  and 
the  Mellon  transform  converts  time  stretch  into  an  exponential 
Multiplier  in  the  transform  domainf3'4^.  In  order  to  see  this 
consider  a  signal  f(t)  and  its  Mellon  transform  F(x).  Thus 


F  (x)  =  M[f  (t)  ]  (x)  = 


-  f f(et 


)  e^xtdt , 


(3) 


Inverting,  we  obtain  that 


=  JL  /*pfYi  “  jxln  ( t ) 


f  M-l 


An  alternate  form  for  equation  (3)  is 


00 

(x)  =  f(t)e  ^X‘'‘nTdT/T 


Now  consider  a  stretch  of  f (t)  , 


g  (t)  =  f  (st) 

Then,  from  equation  (4) 

00 

g  (t)  =  ~  f  F(x)e~jxln(st)dt 

from  which  we  can  easily  see  that 
G(x)  =  F(x)e"jxlns 

which  has  the  exponential  factor.  If  we  assume  small 
stretching,  ie. 

S  =  1  +  3,  |  3  |  «1 

then 

InS  =  ln(l+S)  ~  8 


and  we  have  that 


which  is  clearly  the  form  we  desire. 


However,  if  we  consider  the  effect  of  time-delay  on  the 
Mellon  transform  then  we  have  that 


00 

f(t-x)  =  ~  J' F(x)e_;ixln(t_T)dx 


(9) 


and  we  observe  that  the  kernel  is  not  separable. 

Suppose  we  try  a  Four ier-Mellon  transform  defined  as ^ 

00 

Gf(x)  =  W  /*<•">. 3*“d»  (in) 

—co 

where  F  is  the  Fourier  transform  of  f.  Letting  p  =  eu  in  equation 
10 ,  we  obtain 

g£(x>  -sr 

as  an  alternate  form  for  the  F-M  transform. 

Now  we  consider  a  time  delay  of  the  signal (4) 


f F (P) e” jxlnPdP/P 


(11) 


Then 


h(t)  =  f(t-x) 


H(a>)  =  F(aj)e“jaJT 


and 


V*>  =  i  /p(e“)e-3“e  ejx“du 


(13) 


or  observing  the  Fourier  form  of  this  integral  we  observe  that 


F(ew)e  =  jGh(x)e  JA“ du 

—  00 

where  as 

CO 

F(eW)  =  y*Gf  (x)e"jxaJdx 
—  00 

We  see  that  the  only  difference  is  an  exponential  factor. 
Considering  now  a  stretch  of  the  signal  so  that  again 

g(t)  =  f  ( s  t ) 

and 

G(w)  =  F  (oj/S) 
which  has  F-M  transform 

Gg(x)  =  Gf(x)ejxlnS  (16) 


and  still  retains  the  multilicative  exponential  form. 


II •  Possible  Definitions  of  WBEDF 

Based  upon  the  form  of  equations  (13)  and  (14)  and  the 
properties  of  the  F— M  transform,  we  define  a  wideband  enerqy 
density  function  as 

p 

Af (t ,t)  =  f (t)F*(eP)e"jte  eP  (17) 

We  can  show  that  this  function  has  most  of  the  integration 
properties  of  the  narrow  band  EDF’s.  We  assume  that  f(t)  is  a 
unit  energy,  analytic  signal. 

Properties  of  Af 

1*  Integration  with  respect  to  p 
00  00 

-  27  J Af(t,p)dp  =  J F*  (eP)e“jtePePdp, 

-00  -00 

and  letting  =  uj,  or  p  =  lnco  and  dp  =  dco/co.  Thus  we  have 

00  00 

27  f  Af  ,p)  dp  =  J  F*  (w)  e--^  twdu 

-oo  0 

Since  we  are  dealing  with  analytic  signals  F  has  support  (is  non 
zero)  only  for  w>0  and  we  may  extend  the  limits  of  integration 
from  o  to  (-°°)  .  Thus  we  have 

00  00 

27  /Af(t,P)dp  =  i^  /  F*  (w)  e”^  tuJdw 


(equation  continued  on  next  pg.) 


The  bracketed  term  is  f*(t),  so  that  finally 


CO 

27  jA £(t'P)dP  - 


f  (t) 


2 .  Integration  with  respect  to  t 


Af(t,P)dt  =  F*(eP)eP  /  f(t)e~^te  dt 


The  integral  is  clearly  F  evaluated  at  e  so  that 


J  Af  (t,p)dt  =  |F(eP)  |  2  eP 


We  see  that  Af  has  a  positive  t  and  p  integrals 


3 .  Integration  with  respect  to  t  and  p 

From  either  equation  (18)  or  (19)  we  obtain 


00  00 
27  J  Af(t,p)dtdp  =  /  I  f  (t)  I  2dt  = 


A  second  possible  definition  which  retains  the  same  inte 
gration  properties  as  A^(t,p).  Here  we  define  A^(t,p) 


Ae  (t ,P)  =  G„(t)F*  (eP)ejtPeP 


(21) 


Two  dimensional  Transform  properties 


One  of  the  most  important  properties  of  the  narrow  band 
energy  density  function  is  its  relationship  to  the  narrowband 
ambiguity  function.  Thus,  by  extracting  the  two-dimensional 
Fourier  transform  of  the  NBEDF  we  obtain  the  signal  ambiguifi 
function.  We  will  now  establish  that  a  similar  transformation 
converts  the  WBEDF  defined  by  Af(t,p)  to  the  WB  ambiguity  function 
except  for  the  scale  factor  /s. 

Consider  the  transform 


00  P 

T[Af  (t,P)  ]  (t,X)  =  w/IAf  (t'P)ej  (tX+Te  >dtdp 


If  we  reca.i  1  that  oi  =  e  then  this  is  a  two-dimensional  Fourier 
transform.  Substituting  for  Af  yields 


T[Af] 


00 

“  &  f 


,  P,  P  je 
F*  (e  )  e  e 


y'[GF(t)ejt 


Jt(\+P) 


dt] dp  (26) 


The  bracketed  term  in  equation  (26)  is  F[e  ]  so  that  we  have 


T[Af]  =~  f  F[eX+P]F* (eP)ePejePTdp 


P  X 

Letting  oj  =  e  and  s  =  e  ,  we  obtain 


T[Af] 


™  /  F  (soj)F*  (oj)  e^TaJdoj 


Now  substituting  for  F,  we  have 


f*(t2)e 


T[A^]  =  JJJf  (t,)e  1 


f  2tt  JJJ-'l 
—  00 


■e  dudtjdt.,  (29) 


Integrating  with  respect  to  oj  and  then  yields 


=  ^'f(tl,£*<t2)s<t2‘sti‘T)dtidt; 


=  f  f  (t^  f*  (st^Tjdtj^ 


(30) 


_  1  , 

=  —  Xf(T,s) 

/s 

Equation  (30)  is  the  wideband  ambiguity  function  for  f(t), 
except  for  the  square-root  of  s  factor.  Thus,  if  the  kernel 
in  (25)  had  been 

k(t,X,x,P  )  =  eA/2  ej (tX+TeP) 

we  would  have  an  exact  transformation. 

Clearly  since  the  transform  is  based  upon  a  Fourier  Transform 
it  is  invertible  and  we  can  obtain  from  x..-» 


IV.  Conclusions 


We  have  been  able  to  establish  that  certain  of  the  properties 
of  the  narrowband  energy  density  function  we  also  valid  for  the 
two  definitions  of  wideband  energy  density  functions,  through  the 
use  of  the  Fourier-Mellon  transform.  The  important  observation 
here  is  that  signals  should  be  represented  in  log  frequency  vs. 
time  plots,  and  stretches  in  frequency  become  translations.  The 
physical  meaning  of  A^  and/or  A^  are  subjects  of  continuing  study, 
but  we  have  established  at  least  a  link  from  the  narrowband  to 
the  wideband  theory  of  energy  density  functions. 
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Abstract 


The  possibility  of  using  high  resolution  maximum  entropy 
periodicity  estimation  technique  to  resolve  closely  separated 
echoes  is  studied  in  this  report.  By  Fourier  transforming 
the  matched  filter  output,  a  short  length  periodic  series  in  the 
^^®^T^®ncy  domain  is  obtained.  The  maximum  entropy  filtering 
method  is  applied  to  this  series.  A  study  of  the  relevant 
literature  shows  that  tractable  analytic  models  for  determining 
the  performance  of  the  maximum  entropy  method  with  short  length 
sinusoidal  inputs  do  not  exist.  A  limited  number  of  computer 
simulations  indicate  that  the  technique  does  not  resolve  closelv 
spaced  echoes  reliably. 


Notation 


Unless  indicated  otherwise,  the  following  abbreviations  and 

notations  will  be  used  throughout  this  chapter. 

ACF  :  Autocorrelation  function. 

AR  :  Autoregression 

ARMA  :  Autoregressive  Moving  average 

B  :  Nyquist  frequency 
BW  :  3  dB  bandwidth 

T 

C  :  (C^,C2,  .  .  .CL)  =  AR  coefficients 

DFT  :  Discrete  Fourier  Transform 
E{  }  :  Expectation  operator 

f  :  frequency  variable,  -l/2£f < 1/z 

I  :  identity  matrix 

j  :  /-I 

L  :  order  of  AR  model 

LMS 

algorithm  :  "Least  mean  square"  algorithm  [51] 

N  :  Number  of  data  samples 
MEM  :  Maximum  entropy  method 
PEF  :  Prediction  Error  Filter 
PSD  :  Power  spectral  density 
r.v.  :  random  variable 

Y  (k)  :  E{x(i)  x*  (i+k)}  =  True  ACF  value  of 

{x}  at  lag  K. 

I  :  (Y  (1) »  Y (2) ,  . . .Y (L) )T 

A 

Sxx(f)  :  Estimate  of  the  input  PSD  at  frequency  f  by 
method  xx 

S(f )  :  "True"  PSD  of  the  input  at  frequency  f;  = 

Fourier  Transform  of  { y  (k )  } . 


{x(i)}”=1  : 

(3.3) 

set  of  N  data  samples 

XR  : 
{n (i) }N  : 

(XR'  XR- 1 '  *  *  *  xR-L+1)T 

set  of  independent  additive  noise  samples. 

a  : 

adaption  constant  in  LM5  algorithm 

Cj,k  : 

Kronecker  delta  function 

=  1  if  j  =  k,  0  otherwise 

co  : 

radian  frequency  =  2tt f . 

Double  underlining  indicates  a  matrix;  and  single  underlining, 
a  vector. 

superscripts:  x*  denotes  complex  conjugate  of  x. 

T 

A  "  transpose  of  A 

/\ 

caret  denotes  estaimte,  e.g.  y (k)  =  estimate  of  y(k) 


Section  I. 


Introduction 

Lnderwater  acoustic  communication  channels  are  often 
degraded  by  multipath  and  doppler  scaling  effects.  It  is  be¬ 
lieved  [53]  that  the  multipath  propagation  can  be  adequately 
modeled  by  a  small  number  of  point  reflectors.  If  a  signal 
u(t)  is  transmitted  into  the  channel,  the  received  signal  r(t) 
can  be  represented  by 

I 

Y(t)  =  21  a.  u  (S .  (t— x  • )  )  +  n(t)  (1.1) 

i=l  1  11 

where  { t . }  are  the  delays  associated  with  the  multipath,  {S^} 
are  the  doppler  scales,  {a^}  are  the  amplitudes  and  n(t)  is 
additive  noise,  usually  considered  to  be  white.  The  doppler 
effect  can  arise,  for  example,  due  to  relative  motion  between 
transmitter  and  receiver. 

Knowledge  of  {a^}  and  {t^}  yield  a  better  understanding  of 
the  nature  of  the  communication  channel,  and  are  also  useful  for 
eliminating  intersymbol  interference  effects. 

The  conventional  technique  for  estimating  {a^}  and  {x^} 
is  to  use  a  matched  filter  [48],  The  filter  matched  to  s  (t) 
is  a  linear  system  with  impulse  response  given  by 
h(t)  =  s  (T-t)  where  T  is  a  delay  usually  included  to  make  the 
filter  realizable. 

If  the  received  waveform  is  processed  with  the  matched 
filter  for  u(t),  the  output  is  given  by 


(3.5) 


I 

=  ^  a.  X  )  +  Noise  Term 

i  u  j-  -i- 


(1.2) 


where  Xu(si/Ti)  is  the  wideband  ambiguity  function  of  u(t)  de¬ 
fined  by 


x  ( S ,  ,  T ,  ) 


u 


►Si  u ( t )  u(Si(t-Ti))  dt 


(1.3) 


The  effect  of  doppler  scaling  can  be  made  insignificant 
by  choosing  u(t)  to  be  a  doppler  tolerant  signal.  Such  a  signal 
is  of  the  form 


u ( t )  -  a(t)e^  ,  o  <  t  <  T  ,  where 


a ( t)  and  K  determine  the  bandwidth  of  u(t)  and  a(t)  determines 
the  sidelobe  levels  of  Xu(S^,T^).  The  doppler  tolerant  signal 
has  the  property  that 


vw 


-  e 


jklnSi  x  (1  } 

u  '  1 


Thus  the  effect  of  using  this  type  of  signal  is  to  convert 
the  nonlinear  doppler  scaling  effect  to  a  simple  linear  phase 
shift.  Detailed  discussion  of  the  ambiguity  function  and  doppler 
tolerant  signals  can  be  found  in  [6]. 

With  matched  filter  processing  {a^  and  { x ^ }  are  estimated 
as  the  largest  I  peaks  of  x(t). 

Tne  closest  separation  between  any  two  delays  {x • }  that  can 
be  detected  by  the  matched  filter  is  inversely  proportional  to 


the  bandwidth  of  u(t)  (with  suitable  definition  of  bandwidth 
and  resolution).  In  underwater  acoustic  channels,  high  frequency 
signal  components  are  highly  attenuated.  The  low  bandwidth  of 
the  acoustic  channel  may  limit  the  resolution  provided  by  the 
matched  filter  to  unacceptably  low  levels. 

The  Fourier  Transform  of  x(t)  in  eq.  (1.2)  can  be  written, 
if  u(t)  is  a  doppler  tolerant  signal,  as 

X(f)  =  Eai  ejklnSi  j  U  (f )  !  2  e”  j  27rf  T  i 

=  I U  (f )  I  2  Ea.  ejklnSi  e"j27TfTi  (1.4) 

Equation  (1.4)  can  be  considered  as  a  "time  series"  in 
frequency  domain.  The  problem  of  estimating  {a^  gllklnS-j^ 
and  {x^}  are  frequency  domain  analogs  of  the  usual  harmonic 
analysis  of  time  series.  However,  since  the  bandwidth  of 
|u(f) |  is  low,  the  length  of  available  data  is  very  limited. 

A  nonlinear  spectral  analysis  technique  known  as  maximum 
entropy  method  has  been  proposed  to  obtain  high  resolution 
spectral  estimates  from  small  lengths  of  data.  This  report 
investigates  the  possibility  of  applying  this  technique  to  the 
resolution  of  closely  spaced  multipath  delays. 

Other  nonlinear  techniques  are  available  for  the  estimation 
(and  possibly,  removal)  of  multipath  delays.  The  most  important 
among  these  are  maximum  likelihood  spectral  estimation  methods 
[16,37]  and  cepstral  analysis  [10,54].  It  is  reported  [14] 
that  the  former  is  not  superior  to  the  maximum  entropy  method. 


Spectral  Estimators. 

Unless  otherwise  indicated,  the  observed  input  will  be 
assumed  to  be  sampled  at  periodic  intervals  of  1  time  unit  and 
that  only  the  sampled  values  are  available  for  processing. 
Aliasing  errors  can  be  eliminated  by  lowpass  filtering  the  analog 
input  before  sampling  and  such  errors  will  be  assumed  to  be 
negligible . 

The  conventional  method  of  spectrum  analysis  is  to  estimate 
the  autocorrelation  function  (ACF)  of  the  observed  data,  multiply 
the  estimated  ACF  by  a  suitable  taper  function  and  compute  the 
Fourier  transform  of  the  tapered  ACF.  [9,39,54]  This  technique 
does  not  make  use  of  any  known  structural  properties  of  the 
observations .  Consequently,  one  is  forced  to  treat  the  spectral 
value  at  each  frequency  as  an  independent  variable.  Due  to  the 
large  number  of  unknown  quantities  to  be  estimated,  one  requires 
large  lengths  of  data  in  order  to  obtain  adequate  statistical 
stability. 

Quite  often,  the  input  process  can  be  satisfactorily  approxi 
mated  as  the  output  of  a  discrete  linear  system  driven  by  uncor¬ 
related  noise  [1,12,13,24,27,32,35,43],  A  special  class  of 
linear  systems  consists  of  systems  which  have  only  poles  and  no 
zeroes.  In  this  case,  the  observations  can  be  modeled  by  the 
stochastic  difference  equation 


where  {C^}  =  C_  are  unknown. 

Equation  (2.1)  has  the  appearance  of  a  regression  equation 
in  which  the  independent  variables  are  past  values  of  the  ob¬ 
servations  themselves.  Therefore,  (2.1)  is  often  called  an 
autoregressive  (AR)  model  of  order  L.  For  obvious  reasons,  it 
is  called  an  all  pole  model  also.  Given  x(k-l),  x(k-2),  ..., 
the  best  prediction  of  x(k)  (in  a  least  squares  sense;  and  in 
a  maximum  likelihood  sense  if  (n(k)}  are  gaussian)  is  a  linear 
combination  of  the  past  input  values.  Hence,  the  use  of  the 
term  "Linear  Predictive  Model".  The  terms  (n(k)}  are  called 
residuals,  prediction  errors,  or  "innovations"  since  they  repre 
sent  the  "new  information"  in  {x(k)>  ,  i.e.,  the  part  that 
cannot  be  predicted  from  past  values.  The  filter  with  impulse 
response  (1,  -C^,  -C^,  •••  ^nown  as  a  prediction  error 

filter  (PEF) ,  since  the  effect  of  operating  on  the  input  data 
with  this  filter  is  to  obtain  the  prediction  errors,  (n(k)}. 

The  PEF  is  sometimes  called  "whitening  filter." 

By  multiplying  both  sides  of  (2.1)  successively  by  n(k), 
x(k),  x(k-l)  ....  x(k-L)  and  taking  expectations,  the  following 
equations  are  obtained: 

» 

Y(0)  -  Y  ( 1 )  C1  -  ...  -  y  (L)  CL  =  CT2 

Y(l)  -  Y  (0 )  C1  -  ...  -  y(L-l)  CL  =  0 

(2.2) 


Y(L)  -  Y(L-l)  C.  -.  .  .  -  Y  (0)  CT 


0 


from  which  the  L+l  unknowns  C  and  a  can  be  estimated. 

An  alternative  method  of  computing  C  is  by  solving  the 
linear  equations 


Y  (  U)  .  . 


Y(L-l) 


Y(L-l)  .  . 


Y(0) 


Yd) 

Y(2) 


(2.3) 


Y(L) 


and  a  can  be  calcualted  from  the  first  equation  of  (2.2). 
Equations  (2,3)  are  known  as  the  normal  equations,  or  Yule- 
Walker  equations. 

The  same  equations  can  be  obtained  by  minimizing  the  mean 
squared  prediction  error 


J-l 

E(x(k)  -  £  C.x(k-i)) 

i=l  1 


(2.4) 


BY  applying  a  z-transform  [54]  to  both  sides  of  the  first 
equation  in  (2.2),  it  can  be  seen  that 

Power  Spectral  Density  (PSD)  of  the  input  =  S.D  (f) 


1  -  £c.  e~J  2irfi  2 


i  1 


This  equation  provides  a  method  of  calculating  the  input  PSD 
from  a  knowledge  of  the  PEF  coefficients.  Since  the  PSD  of  the 
PEF  is  the  inverse  of  the  input  PSD,  the  PEF  is  known  as  an  inverse 


filter. 


An  alternative  inpterpretation  for  the  AR  spectral  estima¬ 
tion  technique  has  been  presented  by  Burg  [15].  Burg  observes 
that  the  loss  of  entropy,  in  an  information  theoretic  sense,  by 
passing  (band  limited)  white  noise  through  a  linear  system  with 
frequency  response  C(f)  can  be  written  as 


Ae  =  -  In  C  (f)  r  df 


(2.5) 


This  expression  is  minimized  subject  to  the  constraint 
that  the  inverse  Fourier  Transform  of  | C ( f )  |  2  should  be  equal 
to  the  estimated  data  ACF  values  up  to  lag  L.  The  result  of  this 
is  equations  (2.3)  .  Burg  labels  this  spectral  estimation  method 
"Maximum  Entropy  Method." 

Makhoul  [31]  observes  that  the  AR  spectral  estimate  minimizes 
the  integrated  spectral  ratio 


S(f) 

§AR<f> 


This  interpretation  is  useful  for  modeling  a  selected  portion 
of  the  input  spectrum  by  an  AR  process. 

The  main  attraction  of  the  AR  spectral  estimation  method 
is,  of  course,  that  it  reduces  the  number  of  unknown  parameters 
to  the  minimum,  and  therefore  better  satistical  stability  of 
the  estimates  can  be  obtained.  Possible  sources  of  error  of  this 
method  are:  (i)  the  presence  of  zeroes  in  the  linear  system  which 


generates  the  observations,  (ii)  wrong  choice  of  order  of 
model,  L,  i i i )  presence  of  additive  noise  in  the  observations 
Due  to  the  highly  nonlinear  nature  of  the  technique,  it  is 
extremely  difficult  to  analyze  its  performance  rigorously.  In 
the  succeeding  sections,  we  shall  summarize  the  current  state 
of  investigation  of  this  topic. 

Several  applications  of  AR  modeling  have  been  discussed 
in  [3,4,11,12,23,25,30,33,36,38,44] . 
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Section  III.  Computational  Algorithms. 


There  are  at  least  three  different  computational  algorithms 
for  the  calculation  of  the  AR  coefficients,  C_.  In  each  case, 
it  is  assumed  that  the  mean  (dc)  component  of  the  data  is  zero 
or  has  been  subtracted  out. 

III.l.  By  solving  the  Yule-Walker  Eauations 


In  this  method,  the  autocorrelation  values  of  the  data 
up  to  lag  L  are  estimated  and  the  Yule-Walker  equations  (2.3) 
are  solved  by  directly  inverting  the  ACF  matrix.  There  are  two 
different  methods  of  estimating  the  ACF.  In  method  1(a),  the 
estimates  are 


Y(  j) 


-i  nf 


xk  Xk+ j  ' 


0  <  j  <  L  <  N 


(3.1) 


and  in  method  1(b), 


Y  ( j  ) 


=  _L_  V 

N-i 


T  ^  Xk  Xktj  '  0  <  j  <  L  <  N 


(3.2) 


Equation  (3.1)  effectively  assumes  that  the  data  is  extended 

with  zeroes.  Eq.  (3.2)  gives  an  unbiased  estimate  of  Y(j),  the 

N 

true  autocorrelation  for  lag  j,  if {  x^}  are  assumed  to  be 
samples  of  a  stationary  stochastic  process.  Asymptotically,  as 
N-*°°,  the  two  estimates  become  identical  for  all  finite  lags. 
However,  for  small  N,  the  two  estimates  may  have  significantly 
different  properties.  The  ACF  matrix  estimated  from  (3.1)  is 
guaranteed  to  be  positive  definite.  It  is  quite  possible  that 


the  ACF  matrix  corresponding  to  (3.2)  may  have  zero  or  negative 
eigenvalues,  leading  to  computational  instability  and  absurd 
estimates  of  PSD. 

The  Levinson  algorithm  [29]  is  an  efficient  method  of  solving 
the  Yule-Walker  equations.  Briefly,  this  algorithm  iteratively 
generates  solutions  of  order  k  from  solutions  of  order  k-1. 

The  solution  for  order  1  is  trivial  and  can  be  written  down  by 
inspection.  When  k=L,  the  algorithm  terminates.  The  derivation 

the  complex  Levinson  algorithm  is  presented  in  Appendix  -  I. 

2 

This  algorithm  requires  on  the  order  of  L  multiplications  and 
2L  storage  locations  for  temporary  variables. 

It  is  interesting  to  speculate  whether  a  different  recursive 
scheme  will  lead  to  an  algorithm  with  a  smaller  number  of  multi¬ 
plications.  For  example,  can  solutions  of  order  k  be  generated 
from  solutions  of  order  k/2?  This  "divide  and  conquer"  concept 
is  responsible  for  the  computational  effectiveness  of  the  Fast 
Fourier  Transform  algorithm  [17].  Our  investigations  along  this 
line  have  proved  futile.  However,  it  should  be  pointed  out  that 
since  L  is  usually  relatively  small,  the  computational  effort 
required  for  computing  the  ACF  far  outweighs  the  effort  to  invert 
the  ACF  matrix. 

III. 2,  Burg's  Algorithm 

Burg  [15]  has  proposed  an  algorithm  for  computing  the  MEM 
filter  coefficients  which  does  not  require  explicit  estimation 
of  the  ACF.  This  method  calculates  the  prediction  error  filter 
coefficients  (p.6)  by  an  iterative  scheme  that  resembles  Levinson' 
algorithm.  A  derivation  of  the  complex  Burg  algorithm  is  given 


in  Appendix  -  II.  Note  the  different  error  criterion,  eq.  A-2.8. 
This  ensures  that  the  computed  PEF  coefficients  will  represent 
a  stable  filter. 

This  algorithm  requires  =Nl/  multiplications  and  =2N  storage 
locations.  Clearly,  this  algorithm  is  most  effective  when  the 
number  of  data  samples,  N,  is  small.  When  N  becomes  large  com¬ 
pared  to  L,  the  Levinson  algorithm  and  Burg  algorithm  tend  to 
become  identical. 

III.  3.  Gradient  Based  Methods. 

It  is  possible  to  calculate  the  AR  coefficients  without 
explicitly  calculating  or  inverting  the  ACF  matrix.  The  idea 
is  to  use  some  form  of  stochastic  approximation  (gradient  seeking) 
method  to  solve  eq.  (2.3)  (49,51,52].  The  algorithms  are  of  the 

general  form 

Ck+1  =  C k  +  a  (k)  Xk  (x(k+l)  -  x£  Ck)  (3.3) 

where  CR  denotes  the  estimate  of  C  at  sampling  instant  k,  a  (k) 
is  a  predetermined  gain  sequence  and 

Xk  =  (x  (k)  ,  x(k-l),  .  .  .  x  (k-L+1) T 

The  choice  of  a'k)  =  a  =  constant  corresponds  to  the  Widrow-Hoff 
LMS  algorithm  [51].  The  most  striking  feature  of  this  algorithm 
is  its  simplicity,  since  only  2LN  multiplications  and  L  storage 
locations  are  necessary.  This  algorithm  can  also  track  slow 
variations  in  C.  The  price  paid  for  these  advantages  is  that 
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Section  IV. 

Statistical  Properties  of  Spectral  Estimators. 

Before  any  statistical  estimation  procedure  is  applied  in 
practice,  it  is  useful  to  have  an  understanding  of  its  statistical 
properties,  such  as  its  variance,  probability  distributions,  con¬ 
fidence  intervals,  etc.  In  the  case  of  PSD  estimators  in  par¬ 
ticular,  intelligent  tradeoffs  between  resolution,  variability 
of  the  estimates  and  computational  difficulty  cannot  be  made 
without  a  good  understanding  of  the  behavior  of  the  variable 
possible  alternatives. 

IV. 1.  Windowed  DFT . 

The  statistical  properties  of  the  conventional  windowed 
Discrete  Fourier  Transform  techniques  are  well  known  and  are 
discussed  in  detail  in  [7],  [27],  [9],  [39],  and  [54].  Briefly, 
this  method  estimates  the  ACF  values  for  lags  0,  ...,  N-l  via 
equations  (3.1)  or  (3.2)  and  estimates  the  PSD  by  the  relation 

SDFT(f)  =:  £  Y(k)  W(k)  e"j2irfk  (4.1) 

k=- (n— 1) 


Y  (-k)  =  y  (k)  .  0  <_  f  £  1/2 

W(k)  is  a  suitably  chosen  window  (or  taper)  function  such  that 
W(-k)  =  W(k) .  It  is  often  required  that  W(k)  be  a  positive 
semidefinite  function,  i.e. 

N-l 

y~l  w(k)  e  j2  fk  _>  0  for  0  <  f  <  1/2  (4.2) 

k*-  (N-l) 


(3.19) 

The  purpose  of  this  is  to  ensure  that  the  resulting  PSD 
estimate  is  nonnegative,  as  it  should  be.  Some  of  the  commonly 
used  window  functions  are  the  triangular  (Bartlett)  window, 
the  raised  cosine,  Hamming,  Hanning,  and  Kaiser  windows. 

If  N  is  a  highly  composite  number,  (i.e.  it  can  be  decomposed 
into  a  product  of  a  large  number  of  small  prime  numbers  -  for 
example,  N=2  ) ,  a  fast  algorithm  known  as  the  FFT  can  be  used 
to  compute  SDFT(f)  at  discrete  values  of  f  =  m/N.  Fast 
algorithms  for  the  computation  of  the  ACF  and  PSD  are  given  in 
[39]  and  [54]. 

Assume  that  W(k)  =0,  |k|  > m.  Then  if  the  true  PSD  of  the 

random  process  under  consideration  is  continuous,  it  can  be 
shown  that 


lin  |  sDFT(f)  -  S ( f ) |  +  0  in  probability  (4.3) 

n-*-°° 

n-*-°° 

L/N+0 

This  property  does  not  hold  if  the  data  contains  strictly 
periodic  components. 

The  effect  of  the  window  function  is  to  locally  average 
the  power  spectral  density  estimate  that  would  be  obtained  if  the 
window  function  were  not  used.  This  results  in  reducing  the 
noise  component  of  the  PSD  estimate  at  each  frequency,  while  at 
the  same  time  "smearing"  sharp  spectral  lines  into  adjacent 
frequency  cells.  Therefore,  the  use  of  windows  reduces  the 
statistical  variability  of  the  PSD  estimates  at  the  expense  of 
a  reduction  in  resolution. 


AR  spectral  estimators 


If  S(f)  is  assumed  to  be  continuous  and  bounded  away  from 
zero  and  infinity,  a  result  due  to  Berk  [8]  states  that  the  AR 
spectral  estimate  is  consistent. 


lin 

ISb-oo 

L/N-0 


SAR,L(f) 


-  S(f) 


0  in  probability, 


(4.4) 


This  result  is  intuitively  obvious  from  the  discussion  in 
Sec.  II.  As  N-*-«,l/n-*0 ,  the  first  L  elements  of  the  inverse 

A 

Fourier  Transform  of  S  L(f)  tend  to  the  true  ACF  values,  and  if 
the  tails  of  the  ACF  are  negligibly  small  for  sufficiently  long 
lags,  it  is  apparent  that 


Un  SAR,L(f)  *  S<f>' 


Berk  further  shows  that 


L  (SAR,L(f)  ‘  S (f ) ) /S (f ) 


has  a  limiting  normal  distribution  with  mean  zero  and  variance 
equal  to  2  for  f  /  0  or  1/2  and  equal  to  4  when  f  =  0  or  1/2; 


C°V\'Z  (SAR,L(fl>  -  S<£1>'  If  <W<f2>  -  S<f2))| 


(4.5) 


0  in  probability  for  f.  /  f. 


These  statements  are  valid  only  asymptotically.  The  be¬ 
havior  of  the  AR  spectral  estimator  is  seen  to  be  identical  to 
that  of  the  conventional  windowed  DFT  spectral  estimator,  as 
L,  N  -*■  oo  and  L/N  -+  0 . 

The  Burg  algorithm  differs  from  the  Levinson  algorithm  only 
in  that  the  first  and  last  L  data  samples  are  treated  differently. 
Therefore,  the  Burg  spectral  estimates  can  be  expected  to  have 
the  same  asymptotic  properties  as  discussed  above. 

The  AR  PSD  estimates  using  LMS  algorithm  cannot  be  consistent 
no  matter  how  large  N  is. 


IV. 3.  Confidence  Intervals  for  AR  spectral  Estimates. 

Quite  often,  it  is  desirable  to  form  interval  es.timates  of 
parameters  rather  than  point  estimates.  This  is  often  done  by 
setting  up  confidence  intervals  for  the  point  estimtes.  This 
topic  is  treated  in  detail  in  most  texts  on  statistical  inference, 
e.g.  [41]. 

A 

The  derivation  of  useful  confidence  intervals  of  S._,(f) 
has  proved  to  be  a  very  difficult,  if  not  impossible,  task. 

The  problem  was  approached  as  follows: 

A 

Since  the  autoregressive  coefficients  C  have  been  obtained 
by  conventional  least  squares  regression  analysis,  in  the  large 

A 

sample  case,  /n  C  has  the  approximate  multivariate  normal  dis- 

tribution  with  mean  C  and  covariance  matrix  s  A„  where 

—  — n 


2 

s 


1 


N  -  L 


N  A 

£  (X.“C  x-  -\>  and 

i=l  1  1-1 


(4.6) 


(equation  continued  on  next  pg.) 


Using  this  fact,  the  usual  statistical  inference  on  C, 
such  as  hypothesis  testing,  confidence  intervals,  etc.,  can 
be  carried  out. 

a  .  —  £  • 

The  quantity  (1  -  £  <v  e"1  27T£l)  is  also  a  Gaussian 

i=l  1 

random  variable  whose  mean  and  variance  can  be  calculated  in 

^  —  1 

terms  of  these  of  C,  viz.,  C  and  s  .  This  follows  from  the 

fact  that  a  linear  combination  of  jointly  gaussian  random  vari¬ 
ables  is  again  gaussian. 

With  suitable  scaling,  the  quantity  jl  -  £  C.  e"^2TT£i|2 

i=l  1 

can  be  shown  to  have  a  noncentral  x2  distribution  with  2  degrees 
of  freedom.  This  quantity  is,  of  course,  the  denominator  of  the 

A 

expression  for  SAR(f). 

.  2  *  ''t  ^ 

The  quantity  s  =  Y(o)-  C  y_  ,  appearing  in  the  numerator  of 

A 

the  expression  for  sAR(f),  is  the  estimate  of  the  variance  of 
the  residuals.  If  the  observations  are  gaussian,  least  squares 
regression  theory  shows  that  with  proper  scaling,  s2  is  a  x2 
distributed  random  variable  with  N-L  degrees  of  freedom.  More- 

A 

over,  the  numerator  and  denominator  of  S  (f)  can  be  shown  to 
be  independent,  since,  from  least  squares  theory,  s2  and  C 
are  independent.  Therefore,  at  least  in  principle,  the  distri- 

A 

bution  of  SAR(f)  can  be  calculated. 

There  are  several  difficulties  in  the  practical  application 
of  the  foregoing  theory.  First,  there  is  no  krown  closed  form 


(3.23)  2 

expression  for  the  distribution  of  the  ratio  of  a  X  distributed 

2 

r.v.  to  an  independent  noncentral  X  distributed  r.v.  More 
seriously,  the  unknown  values  of  the  true  AR  coefficients,  namely 

A 

c,  enter  into  the  distribution  of  S  (f)  in  a  nonlinear  manner 
such  that  it  cannot  be  factored  or  subtracted  out  sasily.  This 
restricts  the  utility  of  the  preceding  theory  to  simulation 
studies  where  C  can  be  calculated  exactly  by  analytic  or  compu¬ 
tational  means. 

An  approximate  solution  to  the  problem  can  be  constructed 
from  the  expression  for  the  asymptotic  distribution  of 

-c  AR(f)  - sl£), 

L  '  S(f)  ' 

which  is  given  in  eq.  (4.5).  This  expression  is  valid  only  if 
the  number  of  observations  is  large,  and  it  is  not  known  how 
large  the  set  of  samples  should  be  before  this  formula  can  be 
applied. 

IV. 4.  Choice  of  the  order  of  autoregression. 

A  good  choice  of  L,  the  order  of  the  AR  model,  is  clearly 
important.  If  L  is  too  small,  the  resulting  AR  fit  will  not 
represent  the  data  very  well  in  the  sense  of  a  small  residual 
mean  square  error,  while  a  large  value  of  L  wastes  computational 
resources  and  may  also  lead  to  increased  round  off  errors  and 
other  numerical  problems. 

There  are  several  possible  ways  to  choose  an  appropriate  value 
of  L.  The  physical  mechanism  which  produces  the  observed  data, 


if  known,  may  provide  useful  clues.  Another  possibility  is  to 
choose  a  large  order,  say  ,  AR  fit  to  the  observed  data  and 
calculate  the  distribution  of  the  AR  coefficients.  The  hypothesis 
that  the  true  order  of  AR  is  L  is  equivalent  to  the  hypothesis 
that 


This  hypothesis  can  be  tested  by  statistical  inference 
methods  [41] 

Perhaps  a  simpler  method  is  to  fit  AR  models  of  different  order 
to  the  observed  data  and  in  each  case,  estimate  the  variances  of 
the  residuals.  We  have  already  indicated  that  the  distribution 
of  these  variances  can  be  related  to  a  x ^  distribution.  Now  the 
usual  variance  ratio  tests  can  be  applied  to  determine  the 
lowest  order  beyond  which  increasing  the  order  of  AR  does  not 
result  in  a  statistically  signficant  reduction  in  residual 
variance . 

Akaik  [4]  has  proposed  an  alternative  which  simplifies 
this  procedure.  He  suggests  a  final  prediction  error  (FPE) 
statistic 

FPE (L )  =  (1  +  s2,  with  s2  as  (4.7) 

in  Sec.  IV. 3.,  eqn.  (4.6) 

The  value  of  L  which  minimizes  FPE  (L)  is  taken  to  be  the 


true  order  of  autoregression. 


This  criterion  is  particularly  simple  to  apply  with  gradient 
algorithms . 

Simulation  studies  of  the  use  of  the  FPE  criterion  in 
selecting  the  order  of  the  AR  model  have  been  presented  by 
Ulrych  and  Bishop  [46].  FPE(L)  for  different  values  of  L, 
using  both  the  Yule-Walker  solution  and  Burg  algorithm,  have  been 
computed.  The  minimum  attainable  FPE  for  the  Yule  Walker  solu¬ 
tion  is  typically  much  lower  than  that  for  the  Burg  solution. 

This  is  not  surprising,  since  the  former  was  developed  by 
minimizing  the  residual  error  energy.  As  L  approaches  N,  the 
FPE  for  both  methods  increases  considerably. 


Section  V.  Generalizations. 

V.l.  Pole  zero  modeling. 

A  more  general  model  for  the  observed  data  allows  the  white 
noise  driven  linear  system  generating  the  data  to  have  poles  and 
zeroes.  The  data  are  modeled  by  the  difference  equation 

L  M 

x  (k+1)  =  £  C.  x(k-i)  +  £  b.  n(k-i)  (5.1) 

i=0  i=0  1 

where  the  input  white  noise  sequence  (n(k)}  is  unknown.  {C± } 
and  (b^}  are  the  unknown  system  parameters  to  be  estimated. 

If  (n (k) }  were  known  {C^}  and  {b^}  can  be  easily  estimated 
using  classical  regression  methods.  An  alternative  computation 
technique,  using  the  LMS  algorithm,  has  been  proposed  by  Widrow 
et  al  [52]  and  called  "Adaptive  Noise  Cancelling". 

In  statistical  literature,  the  pole  zero  model  (5.1)  is 
also  known  as  the  Autoregressive  Moving  Average  (ARMA)  model 
or  rational  model. 

Define  the  z-transform  of  a  discrete  sequence  (x(i)}  by  the 
relation 


x  ( z )  =  Ex(i)  z"1  (5.2) 

i 

The  region  of  convergence  of  this  series  will  depend  on 
the  sequence  {x(i)}. 

With  this  notation. 


where 


z 


B  ( z )  =  2_,b. 

i=0 

L 

C(z)  =  2  C.  z_1 
i=0  1 

The  case  where  B(z)  =  1  corresponds,  of  course,  to  an  all 
pole  model. 

The  problem  of  identifying  {C^}  and  {b^}when  (n(i)}  are 
unknown,  is  nonlinear  and  extremely  difficult  [7],  An  approxi¬ 
mate  solution  can  be  obtained  as  follows: 

If  all  the  zeroes  of  the  numerator  and  denominator  are 
within  the  unit-z  circle,  it  is  possible  to  express  1/B(z)  as 
a  Taylor  series  about  z”1  =  0: 

X(z)  =  C  ( z )  •  (l/B  (z) )  '  N(z) 

=  _ 1 _  •  N(z)  (5.4) 

L  ,  00 

Z  ci  Z_1  E  Z_1 

i=0  1  i=0  1 

Both  the  polynomials  in  the  denominator  of  (5.4)  are  analytic 

in  the  region  |z  ^|<1  by  virtue  of  the  assumption  that  all  the 

poles  and  zeroes  of  X(z)  are  inside  the  unit  circle.  Therefore, 

(5.4)  represents  a  stable  infinite  order  AR  process.  It  is 

00 

reasonable  to  hope  that  the  coefficients  become 

negligible  for  sufficiently  large  values  of  D,  so  that  (5.4) 
can  be  approximated  by 


Spectral  analysis  of  the  all  pole  model  (5.5)  can  be 
carried  out  quite  easily.  The  estimation  of  the  coefficients 
{C^}  and  {b^}  in  (5.1)  poses  a  more  difficult  problem.  One 
method,  suggested  by  Graupe  and  Perl  [20],  is  to  estimate  the 
coefficients  of  the  equivalent  AR  model  (5.5),  use  these  estimates 
to  calculate  the  residuals  (n(k)}  and  use  the  estimated  residuals 
to  compute  {C^}  and  {b^}  by  regression  methods.  An  attempt  at 
estimating  the  error  involved  in  this  procedure  has  been  made 
in  [19]  and  [20],  but  the  expressions  for  bounds  on  errors  are 
unilluminating. 

An  alternative  method,  due  to  Durbin,  is  given  in  Anderson 

[7]. 

An  application  of  all  zero  modeling,  in  data  communications, 
is  to  the  problem  of  eliminating  intersymbol  interference.  If 
the  information  symbols  { S  (k ) } ,  usually  assumed  to  be  in¬ 
dependent,  identically  distributed  random  variables  with  a  finite, 
discrete  support,  is  transmitted  over  a  communications  channel 
with  impulse  respon  a  {hk>£  ,  the  received  symbols  {x(k)}  can 
be  represented  as 

L 

x  (k)  =  E  hi  S.  .  +  nOO  (5.6) 

i=0  1  K  1 

The  problem  is  to  reconstruct  {S,  }  given  (x(k)}.  One 


possibility  is  to  rewrite  the  all  zero  moc.el  (5.6)  as  an  equivalent 
all  pole  model  which  can  be  identified.  By  passing  the  received 


symbols  through  the  all  pole  inverting  filter,  (S^)  can  be 
recovered. 

It  is  essential  to  the  success  of  this  scheme  that  the 
z-transform  of  the  channel  impulse  response  should  have  its 
zeroes  within  the  unit  z  circle.  In  order  to  keep  the  order  of 
the  inverting  filter  low,  it  is  also  desirable  that  these 
zeroes  do  not  lie  close  to  the  unit  circle. 

The  condition  that  all  the  zeroes  of  H(z)  should  be  within 
the  unit  circle  can  be  relaxed  if  one  is  willing  to  tolerate 
a  non-causal  all  pole  model  for  the  input,  i.e.  x(k)  is  expressed 
in  terms  of  its  own  past  and  future  values  plus  additive  noise 


(k)  =  £  d,  x(k-i)  +  n (k) 


i=-M. 


It  is  still  essential  that  H(z)  should  not  have  any  zeroes 
close  to,  or  on,  the  unit  z  circle.  Communication  channels  with 
a  small  number  of  point  target  reflectors  are  common  examples 
that  do  not  satisfy  this  condition  and  are,  therefore,  unsuitable 
for  the  application  of  inverting  filter  models. 

Several  nonlinear  algorithms,  such  as  decision  feedback 
equalization  and  the  Viterbi  algorithm  for  maximum  likelihood 
sequence  estimation,  for  the  elimination  of  intersymbol  inter¬ 
ference  can  be  found  in  a  survey  article  by  Proakis  [38] . 


Effects  of  additive  noise  and  nonwhite  drivin 


Sometimes  the  output  of  the  all  pole  system  to  be  modeled 
is  corrupted  by  additive  noise  before  it  can  be  observed. 

The  appropraite  mathematical  model  is 


y(k)  =  £  C.  y(k-i)  +  n (k) 
i=l  1 


(5.7) 


x(k)  =  y(k)  +  n  (k) 

CL 


where,  as  before,  {C^}  are  the  unknown  parameters  to  be  es¬ 
timated,  {n(k)}  is  the  unknown  white  noise  driving  sequence  and 
{na(k)}  is  the  unknown  additive  white  sequence.  {x (k)} are _the 
observations  and{y(k)}  are  the  unobservable  outputs  of  the 
linear  system  to  be  modeled. 

If  we  model  the  observations  by  an  AR  process 


x(k)  = 


i  - 

Ji,  C.  x(k-i)  +  n. 
i=l  1  1 


the  resulting  normal  equations  for  the  estimation  of  are  seen 
to  be 


(R  +  R  )  C  =  y 
ZX  Z^  -  “X 


(5.8) 


The  esimates  of  C  obtained  from  this  will  not  equal  C, 
i.e.  bias  in  the  estimation  of  C  is  unavoidable. 


(3.32) 


The  effect  of  this  on  the  AR  spectral  estimate  is  not 
clear.  With  windowed  DFT  spectral  estirates,  the  effect  of 
additive  noise  is  simply  to  add  a  constant  value  to  the  noise- 
free  PSD,  without  affecting  the  "features"  of  the  latter.  This 
is  not  necessarily  true  with  AR  PSD  estimates.  So  caution  has 
to  be  exercised  in  modeling  noisy  data  by  an  AR  process. 


Next  we  consider  the  case  where  there  is  no  additive  noise 
but  the  driving  noise  sequence  (n(k)}  is  non-white.  In  this 
case,  the  AR  coefficient  estimates  based  on  least  squares  will 
be  biased,  and  so,  presumably,  will  the  spectral  estimates  based 
on  these  estimates.  If  the  ACF  matrix  of  {n(k)}  is  known, 
a  Gauss-Markov  parameter  estimation  scheme  can  be  used  instead 
of  a  least  squares  scheme  can  be  used  to  get  unbiased  estimates 
[41]  . 

Another  solution  is  possible  if  the  driving  noise  ACF  is 
known  to  be  much  narrower  than  the  signal  ACF,  i.e.  the  ACF 
of  (x(k)}.  A  simple  example  of  this  is  when  {n(k)}  is  generated 
by  an  all  zero  (moving  average)  process.  If  we  assume  the 
structure  given  by  eq.  (2.1),  with  E(n(k)  n(k+k1))~0  for 

>_  1  for  some  small  value  of  ,  the  equations  for  the 
unbiased  least  squares  estimation  of  C  are  given  by 


y(L1)  =  c1  y(L1-l)  +  c2  y(l:-2)  +  .  .  .  +  cl  y(l1-l) 

Y  (L1+l)  =  C1  ^(L1)  +  C2  y(L1-1)  +  .  .  .  +  CL  y(L1-L+D 


Y(L  +L-1.)  =  C1  y(Lt-L-2)  +  .  .  .  +  Cl  y  (L,-l) 


(5.9) 


Section  VI. a.  AR  spectral  estimation  wi ih  sinusoidal  inputs. 

Recently  published  research  literature  indicates  an  interest 
in  applying  AR  spectral  modeling  to  time  series  consisting  of 
sums  of  sinusoids  and  white  additive  noise  [14,21,28,36,40,44, 
46,52].  The  motivation  for  this  approach  is  that  a  sinusoid  can 
be  generated  by  inputting  white  noise  to  an  all  pole  system  whose 
poles  are  on  the  unit  z  circle.  Unfortunately,  this  system  is 
unstable,  and  the  preceding  discussion  of  AR  spectral  analysis 
is  not  applicable  to  this  case. 

One  may  still  model  the  observed  time  series  formally  by 
an  AR  process,  such  that  the  poles  of  the  PEF  all  lie  on  the 
unit-z  circle  at  z  =  e^0,  where  gjo  =  radian  frequency  of  the 
input,  sinusoii.  The  coefficients  C  can  be  computed  by  solving 
the  normal  equations  (2.3)  and  the  PSD  estimate,  from  (2.5). 

A  simplified  analysis  of  AR  PSD  estimation  with  sinusoidal 
inputs  has  been  attempted  by  Lacoss  [28]  and  later,  by  Widrow 
et  al.  r 5 2 ]  .  For  simplicity,  first  consider  the  case  where 
{x(k)>  consists  of  a  single  complex  sinusoid  of  frequency  fQ, 
and  let  ojq  =  2TrfQ  be  the  normalized  radian  frequence.  Define 


u  =  (1, 


e3aJ°, 


ej2w°, 


e  j  (L-l )  U)Q)  T 


(6.1) 


then  y(k)  =  a25,  +  e^^o  where  5.  .is  the  Kronecker  delta 

K  ,K  3/3 

function.  The  solution  of  the  normal  equations  is 


C 


ejw° 


and 


(equation  continued  on  next  pg.) 


W“>  -  tttT 


L- 1  2 

1  £  e'jn(“°-“>| 

az+L  n=0 


(6.2a 


From  this  expression,  together  with  the  assumptions  that 
L>>]  and  — 2>>1/  it  can  be  shown  that  the  height  of  the  spectral 
peal;  at  co  -  0)o  is  nearly  equal  to 


a 


(6.2b) 


and  the  3-dB  bandwidth  of  the  spectral  peak,  defined  as 


='  SAR<“> 

j..2 


can  be  shown  to  be 


bwar  ’  a2/l,h2 


(6.2c) 


The  corresponding  equations  for  the  conventional  Bartlett 
windowed  DFT  spectral  estimates  are 


s  =  I  l  V*1  ^j(w„-to)n.2  , 


Comparison  of  equations  (6.2)  and  (6.3)  seems  to  indicate 

that  the  bandwidth  of  the  spectral  peak  estimated  via  AR  modeling 

is  nearly  c^/L  times  the  bandwidth  of  the  peak  from  DFT  analysis. 

For  example,  for  an  input  signal  to  noise  ratio  of  lOdB  (a  "1/10) 

and  AR  estimator  order  of  10  (L-10) ,  the  AR  spectral  peak  is 

narrower  than  the  DFT  spectral  peak  by  a  factor  of  100. 

Several  potential  sources  of  trouble  are  ignored  in  this 

analysis.  It  has  been  assumed  that  the  ACF  matrix  of  the 

2  ... 

additive  noise  can  be  replaced  by  a  I.  In  fact,  this  idealiza¬ 
tion  can  be  achieved  in  practice  only  if  N  is  large.  A  more 

careful  analysis  of  the  performance  of  the  AR  spectral  estimator 

2  2 

should  replace  a  l_  by  a  I  +  Na,  where  Na  is  a  random  matrix 
which  accounts  for  the  effects  of  imperfect  estimates  of  the 
noise  ACF  matrix.  The  effects  of  on  the  inverse  of  the  data 
ACF  matrix  can  be  considerable.  The  condition  ratio  -  i.e.  the 
ratio  of  the  largest  eigenvalue  to  the  smallest  -  of  the  data 
ACF  matrix  for  noisy  sinusoidal  inputs  is  on  the  order  of  L/o  . 
When  this  ratio  is  large  -  the  condition  under  which  the  AR 
spectral  estimator  is  claimed  to  have  high  resolution  -  the  effect 
of  small  random  perturbation  on  the  inverse  of  the  ACF  matrix 
can  be  large.  Therefore,  the  high  resolution  of  the  AR  frequency 
estimators  is  offset  by  large  statistical  variability. 

The  expression  for  the  resolution  of  the  AR  spectral  lines 
is  misleading  for  a  second  reason.  The  use  of 


d 


(w) 


C0=(Jq 


as  a  measure  of  the  resolution  of  the  AR 


spectral  estimator  is  questionable.  It  is  well  known  that  the 
ACF  of  the  sum  of  uncorrelated  random  variables  is  equal  to  the 
sum  of  the  respective  ACFs.  For  PSD  estimators  which  perform 
linear  operations  on  the  estimated  ACF,  such  as  the  windowed 
DFT  methods,  the  use  of  the  resolution  measure  given  above  is 
acceptable.  However,  the  inverse  of  two  ACF  matrixes  is  not 
necessarily  equal  to  the  sum  of  the  individual  inverses.  There¬ 
fore,  the  AR  PSD  estimates  are  not  additive,  and  the  bandwidth  of 
one  spectral  line  may  be  quite  strongly  influenced  by  the  presence 
of  another  spectral  line  nearby. 

Computer  simulation  appears  to  be  the  only  proper  method  of 
evaluating  the  performance  of  AR  spectral  estimators  with  noisy 
sinusoidal  inputs.  Results  presented  in  [40],  [44],  and  [46] 
suggest  that  AR  spectral  line  estimation  can  indeed  give  high 
resolution  spectral  estimates. 


Section  VII. 


Computer  Simulation  of  delay  estimation  with 


matched  filters  and  AR  modeling. 

In  section  I,  we  observed  that  the  problem  of  estimating 
time  delays  is  equivalent  to  the  estimation  or  periodicites  in 
frequency  domain.  As  shown  in  previous  sections,  no  adequate 
theoretical  models  exist  for  studying  the  behavior  of  AR  fre¬ 
quency  estimators  with  short  lengths  of  data.  Therefore,  we 
had  to  resort  to  empirical  computer  simulation  to  study  the 
problem. 

A  large  number  of  variables  can  have  possible  effects  on 
AR  delay  estimates.  Initially,  we  judged  that  the  following  were 
the  factors  most  likely  to  influence  the  estimates. 

1.  Signal  to  Noise  ratio  (SNR) :  This  was  defined  as  the 
signal  energy  divided  by  the  noise  variance.  This  definition  is 
the  "output  SNR"  used  in  evaluating  matched  filter  performance. 
The  noise  was  appropriately  bandpass  filtered  so  that  its  band 
occupancy  was  the  same  as  that  of  the  data.  S’ Rs  cf  30,  20,  and 
10  dB  were  considered,  representing  low,  moderate  and  high  levels 
of  additive  noise. 

2.  The  shape  of  the  matched  filter  spectrum  envelope:  Gaussian, 
triangular  and  rectangular  spectral  envelopes  were  used.  These 
correspond  to  waveforms  with  low,  moderate  and  high  side  lobe 
levels  in  the  time  domain. 

3.  Number  of  data  samples  in  the  frequency  domain.  This 
quantity  was  defined  as  the  number  of  frequency  samples  between 
3- a  points  in  the  case  of  gaussian  envelopes,  and  the  zeroes  of 
the  envelope  in  the  triangular  case.  The  variable  was  made  to 
take  on  values  of  32,  64,  and  128  by  appropriate  choice  of  time 
domain  pulse. 


ion- 


4.  Location  of  delay:  Since  the  AR  technique  is  highly 
linear,  it  is  conceivable  that  the  delay  estimates  could  depend 
on  the  actual  location  of  the  echoes.  Three  values  of  delay, 
namely  25,  64,  and  125  were  considered. 

5.  AR  order:  Values  of  4,  8  and  16  were  experimented  with. 
Obviously,  the  order  of  the  prediction  error  filter  is  one  greater 
than  the  AR  order. 

The  algorithm  used  for  generating  additive  noise  was  of  the 
linear  recurrence  type  described  in  Abramowitz  and  Stegun  [55]. 

A  sequence  of  uniformly  distributed  random  variables  was  generated 
by  the  relation  u^+^  =  +  k)  mod  T  where  a  =  129;  b  =  1; 

T  =  235  and  uQ  =  1098765^ 321. 

After  scaling  by  T,  pairs  of  u(o,l)  random  variables  were 
converted  to  gaussian  r.v.s  by  the  relation 

n1  =  /-21n  u^  cos  2ir  u2 
n2  =  /-21n  u^  sin  2tt  u2 

Double  precision  (64  bit)  arithmetic  was  used  to  generate  these 
r.v.s. 

The  simulations  were  run  on  an  Interdata  7/32  32-bit 
machine.  The  Burg  algorithm  and  Levinson  algorithms  were  com¬ 
puted  with  64-bit  precision,  while  the  FFT  calculations  were 
made  with  32-bit  precision. 

The  results  are  presented  in  Figures  1(B)  through  34(B) 
and  1  (L)  through  8 (L) .  The  subscripts  B  and  L  denote  processing 


with  Burg  and  Levinson  algorithms  respectively. 

It  is  hardly  feasible  to  study  all  possible  combinations 
of  all  the  different  values  of  the  variables  listed  above.  In 
order  to  make  the  investigation  manageable,  some  of  tne  variables 
were  omitted  from  further  consideration  when  it  appeared  that  they 
did  not  influence  the  estimates  much  or  when  they  degraded  the 
performance  significantly. 

Figs.  3(B)  and  6(B)  show  that  at  low  SNR  (=10  dB)  low  order 
(=4)  AR  estimates  are  too  flat  and  the  high  order  (=16)  estimates 
show  spurious  peaks.  Therefore,  we  did  not  consider  the  low 
SNR  situation  further. 

Figs.  1(B)  and  4(B),  2(B)  and  5(B)  and  9(B)  and  11(B)  show 
that  for  the  same  SNR,  there  is  litile  difference  between  the 
estimates  with  gaussian  and  triangular  envelopes.  So  it  was 
decided  to  work  further  only  with  gaussian  envelopes. 

It  is  observed  from  fig,  1(B)  through  8(B)  that  the  AR 
delay  estimates  are  indeed  sharper  than  the  matched  filter 
estimates.  Increasing  the  number  of  data  samples  to  128  (Fig. 9(B), 
10(B),  11(B))  do  not  affect  the  AR  estimates  to  any  great  ex¬ 
tent,  while  the  matched  filter  estiamtes  become  sharper.  When 
the  number  of  samples  is  decreased  to  32,  (Fig.  12(B)  and  13(B)) 

AR  estimates  of  order  8  and  16  show  noticeable  bias  and  a  ten¬ 
dency  to  split  the  single  delay  peak.  This  is  in  agreement  with 
the  observation  made  in  Sec.  IV  that  the  variability  of  the  AR 
coefficients  increases  sharply  as  the  order  of  the  AR  model  be¬ 
comes  comparable  to  the  number  of  data  samples.  So  long  as 
the  order  of  the  model  is  much  smaller  than  the  number  of  data 


samples ,  the  AR  estimates  do  not  show  much  variatior^  when  the 
number  of  samples  is  changed.  We  should  add  the  disclaimer  that 
the  few  observations  we  have  made  are  not  sufficient  to  prove 
this  observation. 

Figs.  14(B)  through  21(B)  do  not  indicate  that  the  AR  delay- 
estimate  is  significantly  affected  by  the  location  of  the  delay, 
at  least  for  small  AR  orders. 

The  conclusions  drawn  from  Figs.  1(B)  through  24(B)  can  be 
summarized  as  follows.  The  AR  delay  estimator  does  not  perform 
well  at  low  SNR  or  when  the  order  of  the  AR  model  is  greater 
than  approximately  one-fourth  the  number  of  available  data 
samples.  The  location  of  the  delay  and  the  waveshape  do  not 
appear  to  have  significant  effects  on  the  AR  delay  estimates. 

^igs.  24(B)  through  33(B)  are  the  results  of  simulations  of 
estimation  of  two  delays.  In  figs.  24(B)  and  28(B),  both  the 
matched  filter  and  AR  estimator  show  the  presence  of  two  distinct 
echoes.  In  Figs.  26(B),  27(B),  29(B)  and  31(B),  the  matched 
filter  shows  two  modes,  indicating  two  echoes,  but  the  AR  estima¬ 
tors  show  only  one  peak.  Finally,  in  Figs.  25(B),  28(B),  30(B), 
and  32(B)  neither  the  matched  filter  nor  the  AR  estimator  is  able 
to  distinguish  between  the  two  peaks.  The  two  modes  of  the  16th 
order  AR  output  in  Fig.  30(B)  could  well  be  spurious,  as  can  be 
verified  by  comparing  Fig.  30(B)  and  22(B).  After  this  series 

failures,  did  not  feel  it  worthwhile  to  continue  the 
simulations. 

Fig.  1 (L)  through  8(L)  do  not  indicate  that  any  improve¬ 
ment  is  obtained  by  substituting  the  Levinson  algorithm  for  the 
Burg  algorithm. 
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FIG.  2(B) 

SAME  AS  FIG,  1(B),  EXCEPT  SNR  =  20  DB. 
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FIG.  4(B) 

BURG  ALGORITHM  VS.  MATCHED  FILTER  FOR  DELAY 
ETIMATION  s  DATA  DESCRIPTION  ;  SNR  =  30  DB; 
TRIANGULAR  ENVELOPE;  NUMBER  OF  DATA  SAMPLES  =  64-; 
ACTUAL  DELAY  =  64;  AR  ORDERS  4,  8  8.  16. 
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FIG.  5(B) 

SAME  AS  FIG.  4(B),  EXCEPT  SNR  =  20  OB. 
<s 
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20.00 


FIG.  7(B) 

COMPARISON  OF  BURG  ALGORITHM  &  MATCHED  FILTER  FOR 
DELAY  ESTIMATION  :  SNR  =  30  DB;  NUMBER  OF  DATA 
SAMPLES  =  64- ;  RECTANGULAR  ENVELOPE;  ACTUAL  DELAY 
=  84;  AR  ORDERS  4-,  8  &  16 
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FIG  -  9(B) 

COMPARISON  OF  BURG  ALGORITHM  £  MATCHED  FILTER 
FOR  DELAY  ESTIMATION;  SNR  =  30  DB;  GAUSSIAN 
ENVELOPE;  NUN3ER  OF  DATA  SAMPLES  =  123;  ACTUAL 
DELAY  =  34;  AP  ORDERS  4,  8  £  16. 
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FIG.  12(B) 

COMPARISON  OF  BURG  ALGORITHM  a  MATCHED  FILTER  FOR 
DELAY  ESTIMATION;  SNR  =  30  DB;  NUMBER  OF  DATA 
SAMPLES  =  32;  GAUSSIAN  ENVELOPE;  ACTUAL  DELAY  ^ 
64;  AR  ORDERS  4,  8  i  16. 
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FIG.  13(B) 

AS  FIG.  1.2(8) ,  EXCEPT  S.XR  -  20  OB. 
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FIG.  1 M-C  B ) 

SAME  AS  FIG.  KB)  EXCEPT  ACTUAL  DELAY  =  125 


FIG.  15(B) 

COMPARISON  3F  BURG  ALGORITHM  &  MATCH  ED  FILTER 
DELAY  ESTIMATION;  SNR  =  30  DB;  NUMBER  C 

SAMPLFS  =  128;  ACTUAL  DELAY  =  125;  G-'iCS 
ENVELOPE;  AR  ORDERS  4,  3  1  16 
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1 7  r  a / 

COMPARISON  OF  DELAY  ESTIMATES  WITH  BURG  ALGORITHM 
&  MATCHED  FILTER;  SNR  =  20  DS;  NUMBER  OF  DATA 
SAMPLES  =  G4-;  GAUSSIAN  ENVELOPE;  ACTUAL  DELAY  = 
125;  AR  ORDERS  4.  *  ■?  IS 


FIG.  23(B) 

COMPARISON  OF  BURG  ALGORITHM  4  MATCHED  FILTER  FOR 
DELAY  ESTIMATION:  SNR  =  20  DB;  NUMBER  OF  SAMPLES 
=  64;  GAUSSIAN  ENVELOPE;  ACTUAL  DELAY  =  25;  AR 
ORDERS  4,  8  &  16 


FIG.  211(B) 

DELAY  ESTIMATION  WITH  BURG  ALGORITHM  AND  MATCHED 
FILTER;  DATA  j  SNR  =  30  DB;  GAUSSIAN  ENVELOPE; 
NUMER  OF  SAMPLES  =  64;  ACTUAL  DELAYS  OF  EQUAL 
STRENGTH  AT  64  &  100;  AR  ORDERS  4,  8  4  16 


FIG.  25(B) 

DELAY  ESTIMATION  WITH  BURG  ALGORITHM  &  MATCHED 
FILTER;  DATA  :  SNR  =  30  DB;  GAUSSIAN  ENVELOPE; 
NUMBER  OF  SAMPLES  =  64;  ACTUAL  DELAYS  OF  EQUAL 
AMPLITUDE  AT  64  &  80;  AR  ORDERS  4,  8  4  16. 
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FIG.  26(B) 

DELAY  ESTIMATION  WITH  MATCHED  FILTER  AND  BURG 
ALGORITHM;  DATA  :  SNR  =  30  DB;  GAUSSIAN  ENVELOPE 
NUMBER  OF  SAMPLES  =  64;  ACTUAL  DELAYS  OF  EQUAL 
AMPLITUDE  AT  64  4  84;  AR  ORDERS  8,  16  &  32 
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FIG.  27(B) 

DELAY  ESTIMATION  WITH  BURG  ALGORITHM  4  MATCHED 
FILTER;  DATA  :  SNR  =  30  DB;  GAUSSIAN  ENVELOPE; 
ACTUAL  DELAYS  OF  EQUAL  AMPLITUDE  AT  64  4  84; 
NUMBER  OF  SAMPLES  = L2S;  AR  ORDERS  8  4  16. 
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FIG.  28(B) 

DELPY  ESTIMATION  WITH  BURG  ALGORITHM  &  MATCHED 
FILTER;  SNR  =  30  DB;  GAUSSIAN  ENVELOPE;  NUMBER  OF 
DATA  SAMPLES  =  64;  ACTUAL  DELAYS  AT  25  &  1 0cj ;  flR 
ORDERS  4,  8  &  16. 
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FIG.  29(B) 

DELAY  ESTIMATION  WITH  MATCHED  FILTER  4  BURG 
ALGOR 7THM;  SNR  =  30  DB;  GAUSSIAN  ENVELOPE;  NUMBER 
OF  SAMPLES  =  64;  TWO  DELAYS  OF  EQUAL  AMPLITUDE  AT 
25  4  35;  AR  ORDERS  =  4,  8  4  16. 
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FIG.  31(B) 

DELAY  ESTIMATION  WITH  MATCHED  FILTER  &  BURG 
ALGORITHM;  SNR  =  20  DB;  GAUSSIAN  ENVELOPE;  NUMBER 
OF  SAMFLES  =  *4;  TWO  DELAYS  OF  EQUAL  AMPLITUDE  AT 
25  &  35;  AR  ORDERS  =  8  4  IS 
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FIG.  32(B) 

DELAY  ESTIMATION  WITH  MATCHED  FILTER  4  BURG 
ALGORITHM;  SNR  =  20  DB;  GAUSSIAN  ENVELOPE;  NUMBER 
OF  SAMPLES  =  32;  TWO  DELAYS  WITH  EQUAL  AMPLITUDE 
AT  25  4  35;  AR  ORDERS  8  4  16. 
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FIG.  34(B) 

DELAY  ESTIMATION  WITH  BURG  ALGORITHM  &  MATCHED 
FILTER;  SNR  =  20  DB;  GAUSSIAN  ENVELOPE;  NUMBER  OF 
SAMPLES  =  64;  ONE  DELAY  AT  25;  AR  ORDERS  =  8  &  16 


FIG-  1  CL J 

DELAY  ESTIMATION  WITH  LEVINSON  ALGORITHM  &  MATCHED 
FILTER.  DATA  SAME  AS  IN  FIG.  24(B) 
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FIG.  2 ( L ) 

DELAY  ESTIMATION  WITH  LEVINSON  ALGORITHM  a  MATCHED 
FILTER.  DATA  SAME  AS  IN  FLG.S5C6) 
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FIG.  IKL) 


DELAY  ESTIMATION  WITH  LEVINSON  ALGORITHM  4  MATCHED 
FILTER.  DATA  SAME  AS  IN  FIG.  26(B) 
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FIG.  8(L) 

DELAY  ESTIMATION  WITH  LEVINSON  ALGORITHM  &  MATCHED 
FILTER;  DATA  SAME  AS  IN  FIG.  22(B) 


Section  VIII.  Conclus;ons 


While  the  AR  delay  estimates  appear  sharper  than  the  matched 
filter  estimtes  when  only  one  echo  is  present,  this  property 
apparently  does  not  translate  into  the  ability  to  resolve  two 
or  more  closely  spaced  echoes  reliably.  It  is  easy  to  display 
examples  where  higher  resolution  is  really  obtained,  but  it 
is  also  quite  possible  that  a  single  actual  spectral  line  may 
have  split  into  two  components.  Obviously,  analogous  observations 
apply  to  high  resolution  AR  spectral  estimation  also,  by  inter¬ 
changing  the  roles  of  the  time  and  frequency  domains. 

An  application  of  AR  modeling  to  the  problem  of  estimating 
the  instantaneous  frequency  of  a  frequency  modulated  waveform 
is  discussed  in  Appendix  III. 
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Appendix  -  A.l 

The  Levinson  Algorithm  [29]. 

The  Levinson  algorithm  computes  the  solution  of  the  equations 


A  C  =  v, 


(A. 1.1) 


where  A  is  a  Toeplitz  Hermitian  (LxL)  matrix,  in  a  number  of 


steps  proportional  to  L  . 


f  1  V,  V2 
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Then  using  matrix  inversion  lemmas,  the  following  algorithm 
can  be  derived: 


£1  "  vi 


Yt  =  1  -  Iy, 
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C  =  solution  of  the  equation  A  c  =  v 


(3.93) 


Appendix  -  A  II 

,  ,  N 

The  complex  data  lx(i)}.  ,  is  modeled  by  the  autoregressive 


representation 


L 

x (k)  =  ^  C.  x(k-j)  +  n (k) 
i=l  1 

The  Burg  algorithm  generates  estimates  of  the  PEF  coef¬ 
ficients  without  explicitly  calculating  the  data  ACF  beforehand. 
The  procedure  is  analogous  to  the  Levinson  algorithm. 

Consider  the  equations,  with  1  <  k  <  L, 


(A. 2.1) 


For  k=L,  (A. 2.1)  are  the  normal  equations  for  the  estimation 

A 

of  the  PEF  coefficients.  The  ACF  estimates  y ( i ) ,  i=0,  .  .  .  L 
are  presently  unknown. 


i  N 

Let  P,  -  i  £  ! x (i) |2 
1  i=l 

•  T 

Partition  vector  (1,  -C^  ^ ,  -C^  .  .  .  -C^  and 

(?k+i '  0 '  •  •  •)  into 


(equation  on  next  pg.) 
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k-1, k-1 
-C* 

k-1 ,k-2 


(A. 2. 2) 


(A. 2. 3) 


from  these. 


Ak  Ck,k  Pk 


(A. 2. 4) 


Pk+1  Pk(1“ICk,kl  } 


(A. 2. 5) 


k-1  . 


=  A*  +  Z  Yd)  C*.,^ 


(A. 2. 6) 


Ck,i  “  Ck-1'  i  ~  Ck,k  Ck-l,k-i  for  1  £  1  £  k-1  (A. 2.7) 


If  Y (k)  were  precomputed,  Ck,k  can  be  determined  from 
(A. 2. 4)  and  (A. 2. 6).  This  gives  one  version  of  Levinson's  algorithm 


(3.95) 

for  the  computation  of  AR  coefficients,  as  can  be  verified  by 
comparison  with  the  algorithm  given  in  Appendix  -  I. 

In  the  Burg  algorithm,  C,  ,  is  determined  by  minimizing 
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(A. 2. 8) 


This  is  equivalent  to  running  the  k-th  order1'  PEF  over  the 
data  in  forward  and  backward  directions  and  computing  the  sum 

t  ’ 

r 

of  squares  of  residuals. 
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(A. 2.9) 


Then  using  (A. 2. 7)  and  (A. 2. 9),  (A. 2. 8)  can  be  written  as 


N-k 


k  =  E!  lai 

,K  i=l  1# 


k  "  Ck,k  bi,J  +  lbi,k  “  Ck,k  ai,J 


The  value  of  ^  that  minimizes  this  expression  is 


N-k 


i=l 


L  .  ,  U  .  . 

i,k  i,k 


'k  .  k  M  — k 


,  2 


2 


+ 


(A. 2. 10) 


Equations  {A. 2. 5),  (A. 2. 7)  and  (A. 2. 10),  together  with  the 
1  2 

initial  value  y*  |  x  ( i )  |  ,  give  the  Burg  algorithm.  The 

iterations  are  performed  for  k=l,  2,  .  .  .  L  and  it  is  assumed 
that  any  variable  with  a  zero  subscript  has  the  value  of  zero. 
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Appendix  -  III 

Instantaneous  Frequency  estimation  of  wideband  frequency  modulated 
waveforms  using  AR  modeling. 

If  the  frequency  of  a  noisy  sinusoidal  input  is  time 
invariant  and  an  adequate  number  of  samples  is  available,  the 
usual  Fourier  Transform  methods  offer  a  combination  of  near 
optimality  in  a  decision  theoretic  sense,  good  resolution  and 
efficient  computational  algorithms.  If  the  frequency  varies 
in  a  narrow  band  around  a  known  carrier  frequency,  phase  locked 
loop  techniques  should  be  considered  [22,45].  AR  frequency 
estimation  techniques  appear  to  be  effective  when  the  input 
frequency  varies  in  a  wideband  over  the  Nyquist  frequency  range. 

If  the  input  frequency  varies  slowly  with  time,  the  effective 
time  interval  over  which  the  frequency  is  nearly  constant  is 
limited.  In  such  situations,  the  ability  of  the  AR  spectral 
estimator  to  produce  good  frequency  estimates  with  short  data 
lengths  is  advantageous. 

Griffiths  [21]  has  noted  that  the  ability  of  the  LMS  al¬ 
gorithm  to  track  slow  time  variations  in  regression  coefficient 
estimates  can  be  put  to  use  to  estimate  short  term  spectra. 
However,  this  scheme  does  not  produce  estimates  of  the  instan¬ 
taneous  frequencies  themselves.  The  task  of  recovering  the 
instantaneous  frequency  from  the  PEF  coefficients  still  remains. 
This  is  usually  done  by  computing  the  AR  PSD  using  a  DFT  or  its 
equivalent  and  selecting  the  frequency  at  which  it  attains  a 
maximum.  Even  though  the  AR  LMS  algorithm  is,  in  principle, 
capable  of  updating  the  input  frequency  estimate  at  every  sam¬ 
pling  instant,  in  practice  it  is  feasible  to  do  so  only  at 
widely  spaced  time  instants. 


An  alternative  approach  is  to  collect  a  small  block  of 
samples  at  a  periodic  rate  determined  by  the  rate  of  variation 
of  the  input  frequency,  and,  assuming  that  the  frequency  does 
not  change  within  each  block,  determine  the  frequency  at  the 
block  sampling  instants  using  Burg’s  algorithm  and  a  DFT.  That 
is,  input  samples  {x (kT  +  n) } ,k  =  0 ,  1,  2,  .  .  .  and 
n  —  d'  2,  .  .  .N)  with  T>>N,  are  analyzed  with  Burg's  Algorithm 
of  order  L,  L<<N,  and  the  resulting  frequency  estimate  is  held 
to  be  the  input  frequency  at  time  kT.  By  suitable  interpolation 
the  value  of  f(t)  for  other  values  of  time  t  can  be  determined. 

Two  simulations  were  run  to  compare  the  computational 
times  required  for  the  Burg  and  LMS  approaches.  In  the  first 

example,  a  frequency  modulated  waveform  was  generated  by  the 
formula 


x(t)  -  cos  (i-Hp-JL—))  for  o  <  t  <  5000 


16 


10000' 


=  cos  (3£r  (3-t-^t)  -  1-°°°-°Tr  )  ,  5000  <  t  <  10032 


16 


10000' 


16 


White  noise  of  variance  0.01  (corresponding  to  a  carrier 
to  noise  ratio  of  17db)  was  added  to  x(t)  ,  and  the  sampling  rate 
was  1  Hz. 

The  instantaneous  freuqency  of  x(t)  is 
f(t)  =  (1+t/5000)  ,  0  <_  t  _<  5000 

*  t4-  (3  -  t/5000 )  , 


5001  <  t  <  10032. 


The  input  samples  were  processed  with  a  4th  order  AR-LMS 
algorithm  with  a =  0.04  (eq.  (3.3)).  After  every  500  iterations, 
the  instantaneous  frequency  was  updated  from  the  256  point  DFT 
of  the  PEF  coefficients,  as  discussed  above.  The  times  required 
for  the  computations  on  an  Interdata  7/32  machine  were  13+1 
seconds  for  the  LMS  algorithm  and  9  seconds  for  the  DFT  calcula¬ 
tions  . 

The  same  data  were  processed  with  a  4th  order  Burg's 
algorithm  after  sampling  blocks  of  32  points  each,  every  200 
seconds.  (T=200,  N=32,  L=4).  The  total  times  required  on  the 
same  machine  were:  4+1  sec.  for  the  Burg  algorithm  and  9 
seconds  for  the  DFT.  The  actual  instantaneous  frequencies  of  the 
input,  as  well  as  the  estimates  derived  from  the  LMS  and  Burg 
algorithms,  are  plotted  in  Fig.  2(F). 

In  the  second  example,  the  input  waveform  is  described 
by 

x(t)  =  cos  (250  sin  (2Trt/2000))  +  n(t),  0  _<  t  <_  10032 

with  variance  of  n(t)  being  0.01  and  the  sampling  interval, 

1  sec.  The  instantaneous  frequency  function  in  this  case  is 

f  (t)  =  cos  (2Trt/2000)  . 

This  data  was  processed  with  the  AR-LMS  algorihtm  (L-4, 

=  0.04)  and  Burg's  algorithm  (L-4,  N=32,  T=200) .  In  either 
case,  updating  of  the  instantaneous  frequency  estimates  was 
performed  every  200  seconds.  The  LMS  algorithm  required 
13+1  seconds  for  the  AR  coefficient  computations  and  the 
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3urg  algorithm  took  5+1  seconds  on  the  same  Interdata  machine. 
The  time  required  for  the  DFT  and  peak  selection  procedures 
was  22+1  sec.  in  either  case.  The  results  are  plotted  in 
Fig.  4  (F)  . 

Discussion 

Both  methods  give  reasonable  good  estimates  of  the  input 
frequency.  The  LMS  estimates  have  a  somewhat  lower  error  than 
the  Burg  estimates,  though  the  Burg  algorithm  is  more  than  twice 
as  fast  as  the  LMS  algorithm  for  these  two  examples. 

We  do  not  claim  that  the  Burg  algorithm  approach  is  neces¬ 
sarily  superior  to  the  LMS  approach.  At  lower  SNRs  it  is 
likely  that  the  LMS  algorithm  performs  much  better.  If  the 
input  frequency  estimates  have  to  be  updated  more  frequently, 
the  LMS  algorithm  will  become  computationally  superior.  On 
the  other  hand,  the  LMS  algorithm  could  become  unstable  for 
bad  choices  of  a  [21]  ,  while  the  Burg  algorithm  is  guaranteed 
to  be  stable.  At  high  SNR  and  with  slow  variations  in  the 
input  frequency,  the  Burg  algorithm  is  an  attractive  alternative 
to  the  LMS  algorithm. 
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FIG.  2(F) 

INSTANTANEOUS  FREQUENCY  ESTIMATION  WITH  BURG  &  LMS 
ALGORITHMS.  (  SEE  TEXT  FOR  DATA  ). 

SOLID  LINE  :  ACTUAL  VALUES;  ASTERISKS  :  BURG; 

SQUARES  :  LMS. 

.  I 


•  i 

i  i 


* 


■*-,  * 
'•-i 


1 


20 . 00 


40.00 


60.00 


1 

80 . 00 


1 


100.00 


FIG.  4(F) 

INSTANTANEOUS  FREQUENCY  ESTIMATION  WITH  BURG  &  LMS 
ALGORITHMS.  (  SEE  TEXT  FOR  DATA.  ) 


SOLID  LINE  :  ACTUAL  VALUESj  ASTERISKS  :  BURG; 
SQUARES  :  LMS. 
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0,1  I 


Notation 


The  following  symbols,  conventions  and  abbreviations  will 
be  used  through  this  chapter. 

ALMS  :  Approximate  Least  Mean  Square  algorithm. 

Unknown  time  varying  parameters  to  be  estimated. 
Expectation  operator. 


c 

— n 


E{  } 
ePr 


n 

EWLS 


f 

I 

j 

L 

LMS 

M 


N 

-u 


-1 


q 

Q 


R 

— ss 


R 

-sx 


T 


Prediction  error  at  time  n  =  x  -sc 

n  — n  — n- 1 


Exponentially  weighted  Least  Squares  Algorithm 
frequency  variable 
Identity  matrix 
/=T 


dimension  of  c_n  (numbers  of  unknown  parameters) 


Widrow 1 s  Least  Mean  Square  Algorithm  [51]. 
diagonalizing  matrix  of  R 


-1  T 

M  =  M* 


i.e.  M  A  M-1  =  R 


-ss 


Time  varying  component  of  P  . 
w  — n 


£  2.n  S*  skT  =  estimated  signal 


autocorrelation  matrix, 
exponential  weight. 

weighting  matrix;  all  the  eigenvalues  of  Q  are 
between  0  and  1. 


input  autocorrelation  matrix. 

1  A  1 

AiS  n  L  *1  ii,  •  R  _  is  assumed  to  be  positivi 
*-l  -  definite  and  Hermitian. 


cross  correlation  between  input  vector  and 


observations. 


n 


=  lim  —  Y'  s*  x 
n-*-«  n  -k 


{s^}  :  set  of  known  input  signal  vectors. 

t  :  continuous  time  variable. 

{x^}  :  set  of  scalar  observations. 

{w^}  :  set  of  additive  noise  samples;  assumed  to 

2 

white  with  variance  a  . 

2 

i.e.  E{w,  w*}  =  6,  .  a  . 

K  1  K  /  1 

a  :  gain  constant  in  ALMS  algorithm. 

A  :  matrix  of  eignevalues  of  R  . 

— s  s 

2  :  additive  noise  variance 
a 

P  :  doppler  scale  vector. 

Double  underlining  denotes  a  matrix. 

Single  underlining  denotes  a  vector. 

th 

[X_^]  denotes  a  diagonal  matrix  whose  (i,i)  element  is  X_^. 


Superscripts 

*  ;  denotes  complex  conjugate 
T  ;  denotes  transpose 
:  estimate 

~  ;  Fourier  Transform 

-1  ;  Matrix  inverse 
X  T 

e.g.  c (f ) *  would  mean  the  complex  conjugate 

transpose  of  the  Fourier  transform  of  the  estimate 

of  c(f) . 
k  th 

Q  :  k  power  of  Q. 

^kj  ;  a  matrix  valued  low  pass  filter  whose  impulse 

Z  2 

response  is  (I,  Q,  Q  ,  .  .  .) 


Subscripts  and  parentheses 

subscript  n  :  value  of  indicated  function  at  time  n 

e.g.  £n  means  the  input  signal  vector  at 
time  n. 

parentheses  i,l  :  (i,l)  component  of  the  indicated  matrix 

e.g.  sn(i)  =  the  i  component  of 
parentheses  t  :  value  of  the  continuous  function  at  time  t 

e.g.  s (1) (P(l) • t)  means  value  of  the  first 
component  of  s(t)  at  time  P(l)-t. 


Euclidean  norm  e.g. 


L 

E 

k=l 


s  (k) 


2 


Differentiation  with  respect  to  vector: 


I. 


Introduction 


We  resume  the  study  of  identifying  underwater  acoustic 
channel  models  of  the  form 

L 

x(t)  =  2  C(i)  (t)  s(P(i)  +  W(t)  (1.1) 

i=l 

Knowing  x(t),  s(t)  and  {x^},  the  problem  is  to  identify  the 
time  varying  coefficients  c(i)  (t) .  The  doppler  scales 
{ P  ( i ) }  are  unknown,  and  it  is  desirable  that  the  estimates  of 
c(i)  (t)  are  insensitive  to  the  values  of  (P(i)}. 

As  in  the  previous  chapter,  we  assume  that  all  waveforms 
are  sampled  with  a  period  of  1  sec.,  and  that  the  samples 
will  be  digitally  processed.  All  aliasing  errors  are  assumed 
to  be  negligible.  With  these  assumptions,  we  get  the  discrete 
model 


T 

x  =  c  v  +  w  ,  where 
n  — n  — n  n 


(1.2) 


c  =  (c(l)(t),  c (2) (t)  .  .  .  c (L) (t) I  )T 
— n  I t=n 

vn  =  (s(P(l)  (t-  (1)  ))  ,  .  .  .  s  ( P  (L)  (t-x  (L)  ))}F 


{wn}  is  assumed  to  be  a  white  noise  sequence  with  variance  o 

i.e.  E5w  wH=  5  .  a ^ . 

1  n  ki  n,k 

At  first,  we  shall  set  v^  =  s^  i.e.  no  doppler  shift  will 
be  assumed  to  be  present.  This  restriction  will  be  later  removed 


Technically,  the  assumption  that  (x  (i) }  are  known 
is  not  necessary.  The  channel  can  be  modeled  as  a  tapped  delay 
line,  the  number  of  taps  being  equal  to  twice  the  time  band¬ 
width  product,  2TB,  of  the  channel  [37]  .  Then  equation  (1.1) 
can  be  replaced  by 

2TB 

X(t)  =  £  c  (i)  (t)  s  (P(i)  (t-i)  ) )  +  w(t)  (1.1a) 

i=l 

If  approximate  a  priori  estaimtes  of  (x J  are  available,  the 
number  of  unknown  parameters  can  be  considerably  reduced. 

The  Widrow— Hoff  LMS  algorithm  has  been  used  successfully 
in  many  instances  to  estimate  time  varying  regression  parameters 
[40,52,53,  54  ].  in  most  of  these  cases,  the  input  signals 

are  assumed  to  be  uncorrelated  pseudorandom  noise.  Indeed,  it 
has  proved  difficult  to  analyze  the  operation  of  the  LMS  algorithm 
with  other  kinds  of  input  signals.  Daniell  [  9  ]  has  shown  that 
with  stochastic  input  signals  with  strong  mixing  property 
(i.e.  whose  autocorrelation  function  tend  to  zero  at  increasing 
values  of  lag  in  a  prescribed  manner) ,  the  variance  of  the 
parameter  estimates  using  the  LMS  algorithm  remains  bounded 
for  slow  adaption  rates.  Glover  [  22  ]  has  studied  the  operation 
of  the  LMS  algorithm  with  sinusoidal  inputs. 

Widrow  and  his  coworkers  have  done  extensive  simulation 
and  experimental  studies  with  the  LMS  algorithm  and  suggested 
some  useful  heuristic  analytic  models  to  describe  its  properties. 

Much  of  the  difficulty  in  analyzing  the  performance  of  the 
LMS  algorithm  stems  from  viewing  it  as  a  variant  of  the  Stochastic 


approximation  (S.A.)  algorithms  [46].  This  point  of  view, 
while  undoubtedly  valid,  is  severely  restrictive.  Most  of  the 
work  on  S.A.  has  focused  on  formal  proofs  of  asymptotic  conver¬ 
gence  of  certain  iterative  schemes  for  solving  nonlinear  regres¬ 
sion  equations  with  conditionally  independent  observations. 

These  proofs  are  not  generalized  easily  by  relaxing  some  of 
the  assumptions  made  there  in.  As  far  as  linear  channel  models 
are  concerned,  the  S.A.  formulation  is  needlessly  general  in 
one  sense  (in  that  the  linearity  of  the  model  is  ignored)  and 
restrictive  in  another  (because  of  the  assumption  of  conditionally 
independent  observations) .  Farden  [ |5  ]  has  recently  proved 
that  linear  regression  equations  can  be  solved  by  stochastic 
approximation  methods  even  if  the  observations  are  correlated, 
provided  some  mild  constraints  are  imposed  on  their  fourth 
moments.  Even  this  result  is  incomplete,  since  little  is  known 
about  the  convergence  rate  of  the  algorithm. 

Since  we  are  considering  linear  channel  models,  it  seems 
reasonable  to  try  to  extend  classical  linear  least  squares 
estimation  methods  to  time  varying  problems.  We  shall  take  this 
approach  in  the  succeeding  sections. 

The  organization  of  this  chapter  is  as  follows.  In  section 
II,  we  discuss  some  well  known  methods  of  time  invariant  para¬ 
meter  estimation,  such  as  Gauss  least  squares,  Kalman  sequential 
identification  and  stochastic  approximation.  Section  III  con¬ 
tains  the  development  of  an  exponentially  weighted  least  squares 
algorithm.  Section  IV  is  devoted  to  an  analysis  of  the 
algorithm.  The  "misad justment"  noise,  the  error  in  tracking 


t:.me  varying  parameters,  errors  due  to  additive  noise  and 
the  choice  of  input  signals  for  application  is  doppler  distorted 
acoustic  communications  channel  are  discussed.  Section  IV. 5. 
develops  an  approximation  to  the  proposed  algorithm,  which  we 
call  the  "Approximate  Least  Mean  Squares"  algorithm  (ALMS) . 

We  conclude  with  a  summary  of  our  work,  some  possibilities  for 
future  research  and  a  list  of  references. 

The  assumptions  on  the  input  signal  are  that 


i  n 

1.  lim  —  y*  s,*  s. 
n-°°  n  4-  -k  -] 


T 


The  limit  may  be  in  quadratic  mean  or  an  almost  sure  sense 
for  stochastic  inputs. 

1  n 

2*  AiS  n  £  lsk(1)  I  2  Isk(m)  |  2  <  oo,  l,m  =  1,2,..  .L. 

k 

This  is  to  ensure  that  the  result  of  low  pass  filtering 
T 

the  matrix  (s_*  s_^  }  will  have  bounded  average  power. 

3*  ^ss  is  Positive  definite  and  Hermiltian.  In  this  case,  there 
is  a  matrix  M  such  that  M*T  =  M-1  and  MAM-1  =  R  ;  A  is  diagonal 

We  define  the  Fourier  transform  of  a  matrix  F  as  the  term  by 
term  Fourier  transform  of  each  element  of  F. 

i.e.  the  (k,l)  element  of  the  Fourier  transform  of  F  is 
the  Fourier  transform  of  the  (k,l)th 


element  of  F. 


Section  II. 1. 


Time  invariant  least  squares  algorithms. 

1 .  Classical  Least  squares  algorithms. 

If  the  parameters  c  in  eqn.  (1.2)  are  time  invariant, 
the  least  squares  estimate  of  c  at  time  n  is  well  known  [57] 
and  is  given  by 


2sjxk) 


(2.1.1) 


This  can  be  expressed  equivalently  as 


=n  ■  (jZs{skT)"1(KE  iJV 


1  t  —  1  A 

Defining  P  =  (  —  V s*s,  )  ,  c  can  be  expressed  in  the  form 

— n  n  — k— k  — n 

of  an  iterative  algorithm  [  ]: 

T  A 

P  s’*  (x  ,  .  -s  c  ) 

/n  /v  — n  — n+1  n+1  — n+1  — n 

c  =  C  +  „  ,  -  ■  ■  -  ■  -  - - - - - 

n+1  n  .  T  _ 

n  +  s  ,  .  P  s  *  , 

— n+ 1  — n  —n+1 


P  s* .  s  P 
,  .  -n  -n+1  -'n+1  -n 

P  =  n  +  1  rP  _  Z _ =Z_ 

zn+l  n  =n  T  * 

n  -n+1  --n  -n+1 


(2.1.2) 


IT  1 

Since  lira  —  s.  s*  -*■  R  and  lim  —  s*x.  -*■  R  ,,  almost  surely 

fi^oo  n  -k-k  -ss  n-^OT  n  — k  k  — sx 


and  in  quadratic  mean,  it  is  easy  to  show  that  cn  -*■  c  almost 
surely  and  in  quadratic  mean  [lo^ys)  .  It  is  not  quite  clear  what 
advantages,  if  any,  the  recursive  formulation  (2.1.2)  has  over 
(2.1.1  ) 


Section  II. 2 


A.  Relationship  of  MMSE  estimation  with  Kalman  Filtering. 

Assuming  the  same  model  as  in  the  previous  subsection, 

xk  =  skT  c  +  wk  (2.2.1) 

We  wish  to  find  the  linear  lease  squares  estimate  of  c,  i.e.  the 
value  of  c  that  minimizes 

E{(xk-skT  c)T  (xk-skT  c) }  (2.2.2) 

This  problem  can  be  cast  in  a  form  to  which  the  sequential 
state  estimation  algorithms  due  to  Kalman  and  Bucy  00,31]  can  be 
applied.  An  extensive  discussion  of  Kalman  Filtering  methods 
can  be  found,  among  other  sources,  in  [3],  [8],  [17],  [23],  [30], 
[31],  [38],  [39],  [43].  The  following  discussion  is  based  on 
the  work  of  Brown  [6],  Chien  and  Fu  [7],  de  Figureido  [12]  and 
Jones  [ 27 ] . 

The  "state  equation"  for  c  can  be  written  as 


— k+1  -k 


(2.2.3) 


x, 


+  Wk 


(2.2.4) 


The  trivial  equation  (2.2.3)  simply  expresses  the  fact  the  c 
is  time  invariant. 

The  algorithm  for  the  sequential  updating  of  the  estimate 


£n  can  easily  derived  (see  the  cited  references)  and  is 


known  to  be 


-n+1  -n  +  -n  -n+l^-n+1  -n  -n+1  +  a  ^ 


2,-1 


(x  .  .  -  s  c  ) 

n+1  —n+1  — n 


(2.2.5) 


r^+l  ^n  ^n+1  ^-n+1  ^n  -n+1  +  0  ^  -n+1  -n 


2,-1 


(c^  and  arbitrary) 

P 

If  we  replace  P_n  by  ~  ,  this  algorithm  is  identical  to 

— a 

the  adaptive  sequential  least  squares  algorithm  of  the  previous 
subsection  (equations  (2.1.2.)). 

B*  Extension  to  Nonlinear  Regression  Problems. 

If  the  relation  between  the  observations  (xk>  and  the  unknown 
parameters  (£k)  is  nonlinear,  the  method  of  quasilinearization 
can  be  used  to  derive  a  sequential  adaptive  algorithm.  In  this 
case,  the  Kalman  estimator  model  is 


^n+1  — n 


(2.2.3) 


x„  =  f  (s  , c  )  +  w 
n  n  — n  —  n  n 


(2.2.6) 


where  (f  (s ,c ) }  are  known  functions  of  s  and  c  .  It  is  assumed 
11  11  11  — n  — n 

that  f n  ( . ,  c^)  is  continuously  differentiable  with  respect  to  £n» 
Expanding  (2.2.6)  in  a  Taylor  series  about  the  latest  (and 
presumably,  the  best)  available  estaimte  of  c,  viz.  c  ,  ,  and  re- 


(4. 12) 


taining  only  the  first  two  terms 


3f 


x 

n  — 


f„(s,c  . )  + 
n  — n  — n- 1  a  c 


n 


— n 


(sn 'Cn-1  ) 

— n  — n— 1 


(c-c  , )  +  w  (2.2.7) 
- n-1  n 


Define 


3f 


n 


H  n  =  TTT-  (S  ,C  ) 
— n  do  — n  — n— i 

— n 


Equation  (2.2.6)  can  be  written  approximately  as 


x  -f  (s  ,  c  . )  +  H  T  c  ,=HTc+w 
n  n  — n  — n-1  — n  — n-1  — n  —  n 


(2.2.3) 


Equations  (2.2.3)  and  (2.2.8)  are  in  a  form  to  which  the 
results  of  Sec.  II. 2. A  can  be  applied.  The  equations  for  the 
sequential  estimation  of  c,  similar  to  equations  (2.2.5),  turn 
out  to  be 


c  ,  =  c  +  P  H  ..  (H  ..  P  H  .. 

— n+1  — n  — n  — n+1  — n+1  —  u  — n+1 


+  °2>'1  (xn+l  '  fn+l<=n+l'£r.) 


—n+1  — n  ^n  —n+1  —n+1  — n  n+1 


(H 


+  0  ) 


2,-1 


Mn+1  £n 


(2.2.9) 


Convergence  of  this  algorithm  is  hard  to  prove.  Following 
Brown  [6],  it  appears  that  the  algorithm  will  convergence  if 
f ^ (  •  /  •  )  is  well  behaved  in  the  sense  that  it  satisfies  certain 
Lipshitz  conditions. 

C .  Extension  to  time  varying  regression  analysis. 

The  state  equations  that  describe  the  "evolution"  of  c 
the  lc.st  two  subsections  assume  that  c  is  time  invariant. 


possible  extension  of  equation  (2.2.3)  is 


— n+l  =  V^n1  +  wln 


(2.2.10) 


where  w^n  ^-s  a  sequence  of  stochastic  forcing  functions  whose 

covariance  is  known.  It  is  assumed  that  w.  is  uncorrelated  with 

in 

wn  ec5*  (2.2.4).  (gn  (  •  )}  is  a  known  sequence  of  functions. 
Kalman  filtering  equations  can  be  applied  to  this  model  to  update 
{c^}  sequentially.  Since  the  assumption  that  the  exact  functional 
form  of  the  evolution  of  {c^}  is  known  appears  to  be  verv  restric¬ 
tive,  we  shall  not  give  details  of  the  derivation  here. 

Section  II. 3. 

Stochastic  Approximation. 

One  version  of  the  stochastic  approximation  algorithms  for 
the  estimation  of  the  time  invariant  parameters  c  in  eq.  (2.2.1) 
is  of  the  recursive  form 


:n+i  -  cn  +  a  (x  -s  c  )s 
n+i  n  n  n  — n  — n  —  n 


(2.3.1) 


where  an  is  a  time  varying  gain  sequence  such  that  Tot  =  , 

^  n 

niS  ctn'>0 *  A  Popular  choice  of  ctn  is  an  =  1/n. 

Under  some  weak  conditions  on  the  fourth  cumulants  of  x 

k 

A 

and  s^ ,  Farden  [15]  has  shown  that  lim  c  -»-c  almost  surely.  A 
considerable  body  of  literature  exists  on  the  theory  and  applica¬ 
tions  of  stochastic  approximation  methods  for  the  solution  of 
nonlinear  regression  equations  [1,3,11,13,15,17,21,29,32,44,50,  to 
list  a  few] . 


The  main  advantages  of  equation  (2.3.1)  are  that  it  requires 
relatively  few  storage  locations  and  multiplications  per  iteration 
Its  limitations  are  that  its  convergence  rate  is  usually  slow,  it 
cannot  track  time  variations  in  c  and  its  error  propagation 
behavior  is  unknown.  Moreoever,  for  linear  channel  models,  other 
efficient,  well  understood  algorithms  for  the  estimation  of  c 
exist  (eq. (2. 1. 1) ) .  In  view  of  these  objections,  we  omit  more 
detailed  discussion  of  stochastic  approximation  algorithms. 


(4.15) 


Section.  III. 

Sequential  least  squares  algorithms  for  approximate  estimation 
of  5 lowly  time  varying  regression  parameters. 

III-l.  Introduction 

In  the  previous  section,  it  was  assumed  that  the  linear 
channel  to  be  identified  was  time  invariant.  This  assumption  is 
not  very  realistic  when  modeling  underwater  acoustic  communica¬ 
tion  channels. 

In  this  section,  we  permit  c  to  be  time  varying,  while  still 
retaining  the  assumption  that  the  channel  is  linear.  We  consider 
models  of  the  form 


x 


n 


+  w 

n 


(3.1.1) 


and  the  problem  is  to  identify  the  time  varying  parameters  c 

— n 

given  (xn>  and  {sn>  .  All  the  variables  in  eq.  (3.1.1)  are  assumed 
to  be  complex. 

If  an  ensemble  (s  }  of  inputs  and  {x  of  outputs  is 

i=i  n  i=i 

available  at  each  time  instant  n,  {c^}  can  be  estimated,  at  least 
in  principle,  by  formulating,  the  classical  regression  problem 


x 

n 


i 


s 

— n 


iT 


c 

— n 


+  w 


n 


1,2, 


I. 


(3.1.2) 


It  may  be  feasible  to  transmit  only  one  input  at  every  in¬ 
stant  of  time,  i.e.  1=1  in  equation  (3.1.2).  In  this  case, 
only  one  equation  is  available  to  solve  for  the  L 


components  of 


for  each  n.  Unless  some  structure  is  imposed  on  the  nature 
of  the  functions  £  ,  the  problem  is  unsolvable. 

One  possibility  is  to  represent  over  a  finite  interval 
of  time  n^  <_  n  <_  as  a  linear  combination  of  smail  number  of 

known  basis  functions.  c  is  modeled  as 

— n 

I 

C  =  V  c  ( i )  <D  (i)  (3.1.3) 

— n  /  v  —  n 

i=l 

Now  the  problem  of  estimating  c^  over  the  time  interval 

n^  <_  n  <_  n^  reduces  to  that  of  estimating  the  (IL)  time  invariant 

parameters  c(i) ,  i  =  1,  ...I. 

The  simplest  choice  of  { n  ( i )  }  in  eq.  (3.1.3)  is  to  take 

<j>  (0)  =  1,  $  (i)  =  0  for  all  other  i,  i.e.,  c  is  assumed  to  be 
n  n  — n 

stepwise  constant.  More  general  basis  functions,  such  as  poly¬ 
nomials  and  trigonometric  functions,  have  been  familiar  to  time 
series  analysts  for  a  long  time.  Models  of  the  form  (3.1.3) 
have  proved  to  be  useful  in  many  engineering  applications,  such 
as  linear  predictive  analysis  of  speech  [36],  system  identifica¬ 
tion  [34]  and  adaptive  antenna  arrays  [41], 

Markov  modeling  of  £n  and  the  associated  sequential  "state" 
estimation  algorithm  have  been  discussed  in  the  previous  section. 

Quite  often,  nothing  is  known  about  the  structure  of  £n 
except  that  it  is  slowly  time  varying.  In  the  rest  of  this 
section,  we  shall  consider  sequential  least  squares  algorithms 
that  give  approxiamte  estimates  of  c  . 


(4. 17) 


Section  III. 2. 

An  Exponentially  weighted  Least  squares  Algorithm. 


The  model  (3.1.1)  is  assumed.  If  c^  =  c,  a  time  invariant 

vector,  the  linear  minimum  mean  squares  estimate  of  c  given  {x  } 

—  n 

and  {s  }is 
n 


c  =  (—  V  ’  s  *s. 
-ii  n  2—i  ~k  — k 

k=l 


V1  <E±« 


n  k  -k 


(3.2.1) 


The  derivation  leading  to  this  estimate  treats  all  the  inputs, 

{Sj,}  and  observations,  {x,  },  with  the  same  importance.  If 

k=l  * 

c  were  slowly  time  varying,  however,  it  seems  more  reasonable 
to  attach  decreasing  importance  to  past  inputs  and  observations. 

The  idea  is  not  new;  Jones  [27]  and  Harris  [25]  have  used  a  similar 
technique  for  the  updating  of  autoregressive  model  parameter 

nr 

estimates . 

We  replace  sk*skT}  and  sk*xk>  in  equation  (3.2.1)  by 


T 

— n,k  — k*— k  ^  an<^  ^^n,k  — k  — k*^  respectively .  (W^  k)  is  a  sequence 


of  weight  matrices  such  that 


n 

EW  .  =  I 
/  — n,  k  — 

k=l  “ 


Vk  =  £  ^  n>k 
^  k  4  0  as  n  -  k  +  ». 


The  only  sequence  {W  ,  }  that  we  have  found  useful  for  the 

n  f  K 

purpose  of  sequential  least  squares  algorithms  is  of  the  form 


W  .  -  (I-C/V1  (I-Q)  *Qn-k  (3.2.2) 

— n  r  is.  “  —  —  — 

where  the  matrix  Q  has  all  its  eigenvalues  in  the  open  interval 

(0,1)  • 

With  this  modification,  equation  (3.2.1)  is  replaced  by 
£n  =  (I-Q11)"1  (I-Q)  T,  Qn_k  sk*  skT)_1. 

n  1  n  . 

((I-Q  )~  (I-Q)  Er  X,  s  *)  (3.2.3) 

-  - k=l—  K 


Let 


P  =  (  (I-Qn)_1  (I-Q)  Zr  s*  s,  '1) 
_n  _  II  - k=l—  *  K 


n 


.n-k 


T.  -1 


(3.2.4) 


and 


d  =  (I-Q11)"1  (I-Q)  £  Qn”K  si 

_n  =  =  ==  k=ll  K  H 


n 


n-k  „  .  * 

■k 


(3.2.5) 


Then 


.n+1,-1  „n-k  T 


£n+l  =  (I-ft)-ft-  E^  sk*sk^  + 


(I-Qn+1)_1  (I-Q)  s 


T,  -1 


-  — n+1  — n+1 


=  (  (I-Qn+J')  _1  (I-Qn)Q  P  "1  + 


—  —  — n 


/T  /  t  r,\  *  1-1  “1 

<1-2  )  <I-Q>  £n+1*  £n+1  } 


(3.2.6) 


We  now  make  use  of  the  matrix  inversion  lemma  [431 


(I+a  b  )  =  I  - 


1  +  b  a 


(3.2.7) 


Applying  this  result  to  (3.2.6) 


n,  -1 


P  Q  (I-Q  )  (I-Q)  s 


n+1  — n+1 


1  +  £n+1T  Zn  S'1  (i-Qn)_1(i-Q>ST. 


Pn  cf1  (i-q")'1  <i-Qn+1) 


(3.2.8) 


dn+i  *  (I-Qn+1)_1  (I-Qn)*Q  dn  +  (I-Qn+1)_1  (I-Q)  x 


x  s  * 
—n+1  —n+1 


(3.2.9) 


Then  £n+^  =  '£n+1  ^.n+i  can  be  calculated  to  be 


c  =  c 
n+1  cn 


+  {in  Q~  <i-Sn> 

{1  +  ^+1T  in  2"1  (I-Qn)'1(I-Q)sn+p 


(3.2.10) 


Equations  (3.2.8)  and  (3.2.10)  constitute  the  exponentially  weighted 

least  squares  (EWLS)  algorithm.  The  term  (x  ^..-s  ,.T  c  )  is  the 

n+i  —n+1  — n 

prediction  error  at  time  n+1.  Since  all  the  eigenvalues  of  Q 

are  in  (0,1)  ,  Q1"1-*-^  as  n-*-°°  in  norm.  Using  this  fact,  the  asymptotic 

EWLS  algorithm  can  be  written  as 

P  Q”1  (I-Q)  (x  .  -s  X.T  l  )  s * 
a  a  — n  —  —  —  n+1  — n+x  — n  n+1 


Section  IV 


Analysis  of  the  EWLS  algorithm. 


This  section  is  devoted  to  the  study  of  the  EWLS  algorithms 
ior  two  Q  matrices: 


9  fin  >  "  I  '  0  <  q  <  1  .and 


2  =  9.(2)  =  (I  +  ct  Rgg)  x  M  [  (l+aXi)"-L]M‘ 


Q(]_)  is  the  simplest  choice  of  Q.  The  utility  of 
is  in  deriving  a  computationally  simpler  Approximate  Least 
Mean  Squares  (ALMS)  algorithm  (Section  IV. 5). 

Since  the  eigenvalues  of  Q  must  lie  between  0  and  1, 
it  is  clear  that 


* 


0  <  a  <  —  /  where  X  is  the  largest  eigenvalue  of 

max 

^ss*  This  is  a  necessary,  but  not  sufficient  condition  for  the 
stability  of  the  EWLS  algorithm. 


Section  rv.l. 

Properties  of  P^-^ 

— n  was  *iefine<i  in  Sec.  Ill  (eq.  3.2.4)  for  large  values  of 

n  as 


£n~  =  (1-2)  E  Qn"  s*  s^ 
—  k — 1 


(4.1.1) 


The  key  to  the  analysis  of  P  -1  is  to  recognize  that  P  _1 

— n  — n 

is  the  output  of  a  matrix  valued  low  pass  filter  {Q^}  whose 

T  —  i 

input  is  {s*  s,  }  .  Therefore  the  mean  value  of  P  -i  is  R 


regardless  of  the  choice  of  Q  and  the  nature  of  the  signal. 

The  variable  component  of  ^  is  harder  to  determine  and 
is  strongly  dependent  on  the  exact  structure  of  the  signal  and 
the  choice  of  Q.  We  shall  consider  the  following  cases. 


T 

1.  0*0/,.;  and^s*  s  }  has  a  line  spectrum  at  frequencies 

f(l),  f(2)  .  .  f (m) .  This  will  be  the  case,  for  example,  if 

is  a  periodic  pulse  train  or  a  sum  of  sinusoids.  In  this 

tVi  -  1 

case,  the  (i,l)  A  element  of  P  is  the  form 

—  n 


Pn_1  (i.D  -  £g(i,i)(l)  e-j2^f(m).n 


Pn'1  (i,l)  -  Pn‘1  (i,l)  » 


(i-q)-  E 


m  1  -  o  e*j,r£(m' 


g  (i/1)  (m)  . 


similarly,  for  Q  -  Q^r 


(P  (k, 1)  -  P  _1  (k, 1) ]  has  the  form 
— n  — n 


ii  a 

?  l+A .  a-e-^ 2lf 


g '  (k ,  1)  (f  (m) ) 


i=l  m  i 


where  g' (k,l) (f (m) )  is  a  bounded  linear  combination  of  the 
T 

elements  of  s*  s,  . 

-k  -k 


These  expressions  will  be  uniformly  small  if  each  f (m) 
is  not  within  the  passbands  of  the  low  filters  {Q  }.  If  a  period 


pulse  train  is  used  as  input  signal,  it  is  necessary  that  its 
pulse  repetition  period  be  much  smaller  than  the  time  constants 
of  the  averaging  filter,  Q. 


T 

Q  -  Q (jj  ;  { s^  }  has  a  continuous  bounded  spectrum.  An 
example  of  this  is  the  pseudorandom  noise  sequence.  In  this 
case,  the  average  power  (or  variance)  of  the  variable  component 
of  each  element  of  P  is 


var 


(i» 1)  }  = 


1/2 

/ 


-1/2 


(1-cr) 


1  -q  e 


-  j  2-n-f  |  2 


!?*  (i)  sk(l)  (f)  |2  (4.1.3) 


where  f  indicates  that  the  integral  is  evaluated  after  omitting 
the  spectral  line  at  f  =  0. 

If  g"(i,l)  =  max  j  s*  (i)  s,  (1)  (f)  I  2  <  »  , 

£  K  K 

then 

Var  {(Pn-1  -  Pn-1)}(i,l)  <  (1-q)  •  g"  (i,  1) 

so  that  for  small  (1-q) ,  the  time  varying  component  of  P 

— n 

has  small  power. 

Similarly,  for  Q  =  Q  , 


Var  {  (P^  -  _Pn  )  (i,l)  }  <_  a  b  (i,  1)  ,  where  b  (i.  1)  is  some 

bounded  constant. 


4.2 


Stability  of 

Since  ^  is  a  time  varying  matrix,  there  is  little 
guarantee  that  it  is  invertible  at  all.  We  can  reasonably  hope 
that  if  the  sum  of  the  mean  power  of  the  variable  components 

of  all  the  elements  of  P  1  is  much  smaller  than  the  smallest 

— n 

eigenvalue  of  its  mean,  viz.  R  ,  then  P  ^  would  be  invertible 

— s  s  — n 

Due  to  the  complexity  and  signal  dependence  of  the  expressions 
for  the  time  varying  components  of  P^  ^ ,  it  is  hard  to  make  any 
general  statements  about  the  conditions  for  the  stability  of  the 
EWLS  algorithm. 

As  an  illustration,  assume  that  the  input  signals  are  inde¬ 
pendent  gaussian  random  variables  such  that  R  =  X  I.  In  this 

— s  s 

case,  with  Q  =  , 


P  -1  =  X  I 
— n  — 


Var  (P  "  (k,l) } 
— n 


X2  for  k  *  1 


2A 2  for  k  =  1, 

1-q 


In  order  to  obtain  adequate  stability,  we  set 


Yj  Var  {P  -1  (k,l)  }  <  X*  -  X: 

kTl  n  -  "ln 


i.e.  (L+L'*)  (1-q)  <  1  +  q  or 


(1-q)  < 


(4.25) 


with  Q  -  Q  ,  a  <  — -Z -  . 

-  -  ;  (L2+L)X 

Anticipating  the  results  of  Section  IV. 5.,  this  is  our  con¬ 
dition  on  a  so  that  the  LMS  algorithm  is  stable,  with  independent 
gaussian  incuts.  A  more  accurate  analysis  by  Merriam  [39]  indicates 
that  the  LMS  algorithm  will  be  stable  if 

2 

a  —  L  +  2  *  Therefore,  our  criterion  for  stability  is 
rather  pessimistic. 

We  shall  denote  Pn_1  by  P^-1  +  n^,  where  lumps  togetehr 

all  the  time  varying  components  of  P^-1.  Hopefully,  each  element 
°f  ^  ^as  small  variance,  and  therefore  we  are  justified  in  making 
the  approximation 


Section  IV. 2. 


Tracking  Behavior  of  the  EWLS  algorithm 

The  asymptotic  expression  for  the  estimate  of  cr  at  time  n 
(eq. (3.2.3))  can  be  written  as 


c 

— n 


((I-Q)  Z  Qn‘k  sj  skT)_1 
_  _  k=l-  ■  k 


(I-Q)  Z  Qn~ksJ  (£kT£k+;';;: }  ( 4 • 2 * 1} 
k=l— 


If  c^  =  c,  a  constant,  £n  is  an  unbiased  estimate  of  £. 


Since  c  is  linear  in  c  and  w  ,  the  effect  of  additive  noise  will 
— n  — n  n 


be  considered  separately  from  that  of  time  variation  of  £n«  In 


the  rest  of  this  subsection,  wr  will  be  assumed  to  be  zero. 


A  step  variation  in  £n  is  easily  analyzable.  Let  £n  =  £^^ 


for  n<n.  and  c  =  c,_.  for  n>n. .  Then  at  time  n,+n, 
l  — n  —  ( 2 )  —  1  1 


w^+n 


— n+nn 


P  +n  {(i-Q  Z  Q 

-n+nl  r  =  k=l  = 


n  +n-k 

s*  s  r  + 

-k  — k  S(2) 


n. 


(i-a>2n  X  an‘kaj  ikT>  (c(i)-c<2)> 


(4.2.2) 


The  first  term  of  (4.2.2)  is  recognized  as  c^)*  T*le  seccn<^ 

term,  which  represents  the  error  in  the  e.stiamte  of  £n 

n  ^ 

approximated  equal  to  Q  (c^-c^))  anc^  approaches  zero  at  an 
exponential  rate. 


Next  we  consider  the  case 


c  =  P  ^-Q)  y  (Q  e  J  ^1TrO)  n-Ks*  s  T  c  e3  27Tfo 

_n  — n  -  z  Z  -k  “k  L(l)  6 


(4.2.3) 


The  weights  (  (Q  e  j27rfO)kj  represents  the  impulse 


response 


to  a  first  order  bandpass  filter  whose  freuqency  is  at 

-fQ.  This  bandpass  filter  operates  on  the  sequence  {s*s  T}  and  the 

K~K 

means  value  of  the  output  of  the  filter  ~  (I-C  e”"2  27Tf o)  _1r 

“  "  — s  s  ’ 

The  variable  component  of  the  output,  denoted  by  N  is  highly 
dependent  on  the  detailed  signal  structures  and  can  be  treated 
in  exactly  the  same  manner  as  in  the  analysis  of  P  -1.  (Sec. IV. 1). 


£n  :  ^s"1  +  Sss'1  £ss_1>  <I-Q)  (1-2  e-^^O)-1 


<|ss  +  V>  £(1)  e‘j21rf° 


~  R  s  1  (I-Q)  (I-Q  e"j27rfO)-1  R 
_ ■=> a  —  —  —  — S  s 


-  5u  a^'hl-Q)  (I-Q  e-^^O)-!  Rss 


+  Sss"1  (I-Q)  (I-Q  e'j2TrfO)"1  Nu'}.  c 


For  Q  =  ql_, 


A  ^ 

:  ,77^*0  £n  +  - Iss"1  <-u'"-u,£n  (4.2.4) 

l-q  e  1-q  e  ]27Tt°  “  “ 


and  for  Q  =  (1+  oi  R  _)“1. 


-  1  +  A  .  a  -  e  J  ° 

1 


,  A .  a  , 

-  R  _1  N  M  [ - - - rv--,— 1  c 

=ss  =u  =  1  +  X.a  -  e"3 21Tf  o  z  -n 

1 


+  M  [ - ^ -  -]  M~ 1  N  '  c 

—  l  +  A^a-eJ  °  —  —  — 


(4.2.5) 


We  showed  that  each  term  of  N  (and  similarly  N  ')  is  a  time 
varying  function  whose  variance  (or  power,  in  the  case  of  deter¬ 
ministic  signals)  is  proporational  to  (1-q)  or  ct,  according  as 
Q  =  q  I  or  Q  =  (I4-  a  R^)  1. 

Since  (1-qi  and  a  are  assuemd  to  be  small,  the  effects  of 

N  '  on  c  are  second  order  effects  compound  to  the  first  terms  of 
— n 

(4.2.4)  and  (4.2.5).  Further,  in  order  to  ensure  uhat 

c  ~  c  ,  it  is  essential  that 
— n  — n 


.  A  .  a 

-  ^  WttV  ~  ~  1  and - - - ~  1. 

1-q  e""^T^°  1  +  A. a  -  e"j2^f0 


Under  this  condition,  N.'  and  N  '  will  be  highly  correlated,  since 

— u  — u 

»p 

they  are  the  results  of  filtering  the  same  sequence  {s£s^  }  with 
two  linear  filters  with  highly  overlapping  passbands.  Therefore 
each  element  of  N '-  N  will  be  negligible  in  comparison  with  the 

VJ  “ u 

first  tevms  of  (4.2.4)  and  (4.2.5). 


c 


for  Q  =  a  T 


(4.2.6) 


Since  is  linear  in  c^,  the  equations  (•'1.2.6)  and  (4.2.7) 
can  be  generalized  to  the  case  where  c^  consists  of  sums  of 
sinusoids,  and  in  the  limit,  has  an  arbitrary  spectrum  c(f) . 


(4.2.9) 

A 

These  expressions  for  c^  show  that  it  is  highly  dependent 
on  the  spectrum  of  £n.  In  addition,  for  Q  =  Q2,  £n  also  depends 
on  the  data  correlation,  via  M.  if  special  forms  are  assumed 
for  £ ( f )  and  M,  it  may  be  possible  to  evaluate  these  integrals 
explicitly.  For  example,  if  M  is  taken  to  be  the  identity  matrix 
an<^  £n  ■*-s  assumed  to  have  a  uniform  lowpass  spectrum,  a  rational 
spectrum  or  a  harmonic  spectrum  (4.2.8)  and  (4.2.9)  may  be  evalua¬ 
ted  explicitly.  We  shall  not  carry  out  the  details,  but  simply 
point  out  that  if  the  bandwidth  of  each  component  of  c(f)  is 
much  smaller  than  the  bandwidth  of  the  matrix  low  pass  filter  whos 

transfer  function  is  (I-Q)  (I-Q  2Trf )  ”1  J  ~  c 

Z  Z  Z  Z  '  — n  ~  —  n’ 

An  important  point  should  be  noted  if  periodic  pulse  trains 
are  used  as  input  signals.  If  the  input  signal  has  a  period  T, 
the  spectrum  of  (s£  skT}  will  have  h  .monic  components  at  fre¬ 
quencies  of  integral  multiples  of  1/T.  It  is  essential  to  ensure 
that  1/T  is  much  larger  than  the  sum  of  the  bandwidths  of  the  low- 


ic 

pass  filter  {Q  }  and  the  largest  significant  frequency  component  of 
c  (f )  ,  f  .  Otherwise  the  bandpass  filter  {  (Q  e  ^°)^}in  eq.  (4.2.3) 

-  A 

will  introduce  significant  time  varying  components  in  c  . 


Section  IV. 4. 


Choice  of  input  signal  for  adaptive  estimation  in  doppler  corrupted 
communication  channels. 

Until  now,  we  have  ignored  all  doppler  shift  effects  and 
assumed  that  the  observations  {xk}  can  be  modeled  as  a  noisy 
linear  combination  of  the  undistorted  input  signals.  In  this 
subsection  we  consider  a  more  general  model  of  the  form 

xk  =  -kT  -k  +  wk  (4.4.1) 

where 

v^  is  defined  by 


vkU) 


Vk  -J"vk<2) 


VL> 


t=k 


v(i)(t)  -  s(i)(P(i)*t),  i  =  1,2, ...L;  P(i)  =  doppler  scale 

.  th 


associated  w^th  the  i  component  of  £(t) . 

In  order  to  focus  on  the  influence  of  the  wideband  ambiguity 


function  of  s(t)  on  c  ,  we  simplify  matters  by  assuming  that 


£n  ~  £  ~  constant  and  w^  =  0.  We  have  shown  in  Sections  IV. 2.  and 


IV. 3.  how  these  restrictions  can  be  removed.  With  these  assumptions 

£n  -  In  <I-a>  a"'k  ak*  ^T)c  (4.4.1a) 

Def ine 


n 


D  '  =  T  q: 
-n  — 


n-k  *  T 

V  21. 


(4.4.2) 


D 

— n 


(4.4.3) 


is  the  output  of  passing  the  sequence  {s_£  }  through  a 


•k  -k 


lowpass  filter  with  coefficients  {Q  )  so  that  the  mean  value  of 


IV  is  the  time  average  of  {s*  v^  }. 


The  (hopefully  small)  variable  component  of  IV  will  depend 


on  the  spectral  characteristics  of  {s*  v^  }. 


th 


Therefore,  the  mean  value  of  the  (i,l)  element  of  V  is 


n 


D'  (i,  1)  =  lim  -  yis  (i)  *  (t) 
—  n-*-00  n  ' 


k=l 


t=k  ‘ 


t=k 


lim  iEs(i)  (t)  !  ‘  s  (1)  (P  (1)  - 1 

n  n  k  | t-k 


t=k 


(4.4.4) 


In  the  absence  of  doppler  scaling, 


D  ( i,  1)  =  lim  i  s*  ( i)  (t) 
—  n^°°  n 


t=k 


•  s (1)  (t) 


t=k 


(4.4.5) 


Define  the  cross  ambiguity  function  between  s(i)  (t)  and 
s  (1)  (t)  as: 


XS  (i)  ,S  (l)  {T'P)  =  k  /  s*(i)(t)  s(l)  (P(t-T)  )dt. 


Assuming  that  aliasing  errors  due  to  sampling  are  negligible 


eq.  (4.4.4)  is  the  cross  ambiguity  function  between  s(i)(t)  and 
s(l)(t)  at  zero  lag,  i.e. 


4.35 


=  Xs  (i)  ,  s  ( 1)  (0'P(1> 


Similarly, 


rU/1)  Xs(d)  ,s(l)  (0'1) 


Since  cn  is  equal  to  (from  (4.4.1a)  and  (4.4.2)) 


£n  ~  pn  *  2.^  £/  it  is  clear  that  the  estimation 

algorithm  will  be  insensitive  to  doppler  scaling  if  and  onlv  if 

-n~-n'  i,e*  Xs(i)  ,s  (1)  ^0,P  is  independent  of  P(l).  In 
other  words,  for  doppler  insensitive  adaptive  estaimtion,  s(i) 
and  s  (1)  should  be  chosen  such  that 


*s(i),s(l)(0'p(1))  ^(i)  ,s(l)  (0,1) 


(4.4.6) 


conversely,  if  *s  (i) /S (1)  (o,P (1) )  2  0  for  P(l)  ji  1,  i.e.  if  s(i) 
and  s (1)  are  doppler  sensitive  signals,  2  0  irrespective  of  the 
value  of  c. 

A  class  of  wideband  doppler  insensitive  frequency  modulated 
signals  has  been  developed  by  Altes  and  Titlebaum  [  E  ].  They 
have  the  general  form  u(t)  =  a(t)  ejblnt  where  a(t)  and  b  control 
the  time  width,  bandwidth  and  rate  of  variation  of  instant/  aneous 
frequency  of  u(t) .  This  class  of  signals  has  the  property 


:  ejblnP  *>...(T,1). 


If  the  input  signals  {s^}  are  chosen  from  this  class  of 
signals  as 

{-(t)  } 


the  estimate  of  c  in  the  presence  of  doppler  scaling  will  be 
cn  =  (0(1)  ejb(l)lnP(li(2)jb(2)lnP(2)/_>e(L)  e jb (1) InP (L) , T_ 

While  this  estaimte  is  not  independent  of  the  doppler  seal- 
ing  factors  (P(l),  ...  P(L))  ,  these  appear  in  c^  in  a  relatively 
tractable  fashion  as  a  simple  phase  shift  of  each  component  of 

c . 

Another  class  of  noise  like  signals  with  desirable  doppler 
insens itity  properties  was  suggested  to  the  author  by  Prof. 

E.  Titlebaum.  They  are  of  the  form 

r 

u(t)  =  £  a  ( i)  f  (P(i)  (t) ) 
i=l 


=  (a (1)  (t) 


jb ( 1) lnt 

“  t 


a (2 ) (t)  e 


jb  ( 2 ) lnt 


a  (L)  (t)  e 


jb ( t) lnt} 


where  f(t)  is  generated  by  suitably  modulating  a  pseudo  random 
sequence.  We  have  not  under taken  any  intensive  study  of  the 
ambiguity  functions  of  such  signals. 


Section  IV. 5. 


A  simplification  of  the  EWLS  algorithm. 

The  EWLS  algorithm  has  been  shown  to  be  able  to  track  slow 
time  variations  of  £n  and  with  a  choice  of  doppler  tolerant 
input  signals,  to  be  insensitive  to  doppler  scaling  effects.  T.I 
main  practical  difficulty  in  the  application  of  this  algorithm 
is  that  it  is  computationally  cumbersome.  It  is  worthwhile  to 
look  for  possible  simplification  that  will  make  the  EWLS 
algorithm  more  attractive  for  real  time  applications. 

consider  eq. (3 . 2. 10  )  f or  updating  c^: 


— n+1 


c 

— n 


Pn  Q 

+  Zn  Z 


-1 


d-Q)  s*+1 


(X  ,“S  .1  C  ) 

n+1  -n+1  -n 


1  +  Sn+1  £«  a 


Q'1  d-Q)  »J+1 


Let  Q  -  U  +  aR  )  1  and  P  =  R  +  n 
Z  Z  Zss  — n  -ss  Ai 


where  is  the  time  varying  component  of  P  -1.  For  small  a 
we  have  argued  in  Section  IV. 1.  that  the  power  (or  variance) 
of  each  component  of  N  is  "small". 


P 

— n 


R 

— ss 


-1 


-  R 

— ss 


-1 


N.,  R 

— u  -ss 


-1 


Let 


ep 


n+1 


x 


n+1 


T 

—n+1 


Using  these  definitions  and  approxiamtions ,  the  second 
on  the  right  hand  side  of  eq. (3.2.10)  can  be  replaced  by 


term 


-n+1 


=tn+l 


^n+l  ^n+1 


o2isn+1i2  e  pn+1  (i-s,;1  ^)^n+1)/ 


d+^Jid-Rss'1  Nu>^nJi) 


Define  the  "misad justment"  vector 


“  epn+1  (jiss"1  Eu  +“  l§.n+1l2<I-Sss"1  Su>>*‘n+i 

(1  +  “l^+xl2  -  “  £n+!  *ss  Ei  i*n+l 


(4.5.1) 


Due  to  the  fact  that  a  is  "small"  ani  each  element  of  N 

— u 

is  "small",  each  element  of  n  is  "small"  compared  to  aep  , 

— ma  n+1 

sn+*.  For  special  cases  of  {s^},  such  as  pseudo  random  noise  and 

sums  of  sinusoids,  the  expression  for  n^  may  be  evaluated 

explicitly.  For  more  general  classes  of  input  signals,  it  appears 

hopeless  to  study  the  properties  of  n^a  analytically. 

With  the  reasonable  assumption  that  n  is  "small",  we  may 

— ma 

ignore  it  altogether  and  write  £n+-^  (eq.  3.  )  as 


c  ,  ,  =  c  +  a  s  s  c  ) 

—n+1  — n  —n+1  —n+1  —n+1  — n 


(4.5.2) 


As  before,  c^is  arbitrary. 

Eq.  (4.5.2)  will  be  called  the  "Approximate  Least  mean 
Squares"  (ALMS)  algorithm.  We  have  derived  this  algorithm  from 
the  EWLS  algorithm  with  Q  =  (I  +  a  R  )  1  simply  by  omitting  a 


small  "maladjustment"  term.  Except  for  this  additional  source  of 
error,  our  analysis  of  the  EWLS  algorithm  carries  over  to  the 
ALMS  algorithm  also.  Specifically,  our  discussions  of  the  track¬ 
ing  behavior  and  the  doppler  sensitivity  of  the  EWLS  algorithm 
aPPly  to  the  ALMS  algorithm  also. 

The  ALMS  algorithm  is  similar  to  the  so  called  "Least  Mean 
Square"  (LMS)  algorithm  developed  by  Widrow  [51].  Widrow's 
version  of  the  LMS  algorithm  reads 


c  =  c  + 
— n+1  — n 


(x_  -  s 


c  ) 


— n  — n  — n  — n 


(4.5.3) 


The  discrepancy  between  (4.5.2)  and  (4.5.3)  is  probably  due 
to  Widrow's  use  of  some  dubious  "approximations"  such  as  "estimate 
the  mean  of  the  stochastic  gradient  by  its  instantaneous  value" 
etc.  (sic! )  [ 51,54  ] . 

Several  interesting  signal  processing  applications  of  the  LMS 
algorithm,  such  as  prediction,  antenna  array  design,  noise  cancel¬ 
lation  and  network  modeling  have  been  discussed  by  McCool  [  37  ] 
and  Widrow  and  his  coworkers  [52,53  ].  Some  other  interesting 

applications  of  the  LMS  algorithm  are  in  the  areas  of  equalization 
in  communication  channels  [20,40],  echo  cancellation  [48]  and, 
signal  to  noise  ratio  maximization  [47,55],  and  instantaneous 
frequency  estimation  [23].  Analog  equivalents  of  the  LMS  algorithms 
have  also  been  considered  bv  researchers  [49], 


(4.40) 


Section  V. 

Conclusion 

An  exponentially  weighted  sequential  least  squares  algorithm 
(EWLS) ,  which  is  a  modification  of  the  classical  least  squares 
algorithm,  has  been  developed.  The  Widrow-Hoff  LMS  algorithm  is 
derived  as  an  approximation  to  this  algorithm.  The  only  condi- 
tion  on  the  input  signals  (sk>  is  that  the  sequence (s*  £k  } 
should  have  a  finite  positive  definite  mean  and  an  integrable 
power  spectrum.  This  implies  that  a  wide  variety  of  (s^s, 
such  as  pseudorandom  sequences,  correlated  noise  like  sequences 
and  periodic  pulsed  FM  signals  are  suitable  for  processing  with 
the  LMS  adaptive  algorithm.  Analytic  expressions  (not  all  of 
which  are  of  much  practical  value)  have  been  derived  to  explain 
the  significant  properties  of  the  EWLS  (and  hence,  the  LMS) 
algorithm,  such  as  the  "misad justment  noise",  stability,  error  ^ 

in  tracking  time  varying  parameters,  effect  of  additive  noise  and 
the  role  of  input  signal.  In  particular,  the  usefulness  of 
choosing  an  input  signal  sequence  whose  wideband  ambiguity  func¬ 
tion  is  insensitive  to  doppler  scaling  becomes  transparent. 


Section  VI . 


Topics  for  rurther  research . 

1.  The  adaptive  algorithms  considered  in  this  report  have 
assumed  that  the  observations  are  linear  functions  of  the  unknown 
parameters  c^.  The  possibility  of  extending  these  results  to 
nonlinear  regression  problems  can  be  investigated.  Section  T1.2. 
on  Kalman  filtering  indicates  how  this  problem  can  be  approached  for 
time  invariant  c.  The  question  is  whether  and  how  the  nonlinear 
Kalman  identification  algorithms  can  be  modified  to  apply  to 
time  varying  situations.  Possible  applications  include  phase 
tracking  in  coherent  communcation  systems  and  simultaneous 

estimation  of  doppler  seales  and  dispersion  amplitudes  in  communica¬ 
tions  channels. 

The  estimation  errors  in  adaptive  filtering  with  the  approxi¬ 
mate  least  mean  squares  algorithm  are  highly  dependent  on  ci,  the 
gain  constant.  The  "misadjustment "  noise  and  additive  noise 
components  of  the  error  decrease,  and  the  tracking  error  increases, 
with  decreasing  a.  Presumably,  there  is  an  optimum  choice  of  a 
for  which  the  total  error  is  minimized.  In  view  of  the  com¬ 
plexity  of  the  expressions  for  the  three  sources  of  error,  it 
seems  that  a  well  chosen  variable  gain  sequence  is  the  only 
feasible  way  to  find  the  optimum  a. 

3.  We  have  shown  that  the  "misadjustment"  noise  in  the  ALMS 
algorithm  is  related  to  the  spectral  density  function  of  each 
term  of  (£k*  }.  The  misadjustment  noise  will  be  reduced  if 

the  sequences  (s^*(i)  s^(l))  for  each  i  and  1  have  relatively 


(4.41) 


Section  VI . 

Topics  for  further  research. 

1.  The  adaptive  algorithms  considered  in  this  report  have 
assumed  that  the  observations  are  linear  functions  of  the  unknown 
parameters  c^.  The  possibility  of  extending  these  results  to 
nonlinear  regression  problems  can  be  investigated.  Section  II. 2. 

on  Kalman  filtering  indicates  how  this  problem  can  be  approached  for 
time  invariant  c.  The  question  is  whether  and  how  the  nonlinear 
Kalman  identification  algorithms  can  be  modified  to  apply  to 
time  varying  situations.  Possible  applications  include  phase 
tracking  in  coherent  communcation  systems  and  simultaneous 
estimation  of  doppler  seales  and  dispersion  amplitudes  in  communica¬ 
tions  channels. 

2.  The  estimation  errors  in  adaptive  filtering  with  the  approxi- 
mate  least  mean  squares  algorithm  are  highly  dependent  on  a,  the 
gain  constant.  The  "misadjustment"  noise  and  additive  noise 
components  of  the  error  decrease,  and  the  tracking  error  increases, 
with  decreasing  a.  Presumably,  there  is  an  optimum  choice  of  a 
for  which  the  total  error  is  minimized.  In  view  of  the  com¬ 
plexity  of  the  expressions  for  the  three  sources  of  error,  it 
seems  that  a  well  chosen  variable  gain  sequence  is  the  only 
feasible  way  to  find  the  optimum  a. 

3.  We  have  shown  that  the  "misadjustment"  noise  in  the  ALMS 
algorithm  is  related  to  the  spectral  density  function  of  each 
term  of  {s_,^*  s^  }.  The  misadjustment  noise  v. ill  be  reduced  if 
the  sequences  (s^*(i)  s^(l))  for  each  i  and  1  have  relatively 
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